Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Neural networks and deep learning, Slides of Machine Learning

Oxford House College Machine Learning

This lecture is about feed-forward neural networks for image classification. Dmitry Kobak | Machine Learning I | Neural networks and deep learning ...

Typology: Slides

2021/2022

Uploaded on 09/27/2022

lilwayne 🇬🇧

4.1

(7)

243 documents

1 / 21

This page cannot be seen from the preview

Don't miss anything!

Neural networks and deep learning

Deep Learning by Goodfellow et al. defines ‘deep

learning’ as algorithms enabling “the computer

to learn complicated concepts by building them

out of simpler ones”.

This is implemented using neural networks that

consist of hierarchically organized simple processing

units, neurons.

This field had a series of huge successes starting in ∼2012: classifying

images, generating images, playing Go, writing text, folding proteins, etc.

Known together as the ‘deep learning revolution’.

This lecture is about feed-forward neural networks for image classification.

Dmitry Kobak | Machine Learning I | Neural networks and deep learning

Partial preview of the text

Download Neural networks and deep learning and more Slides Machine Learning in PDF only on Docsity!

Neural networks and deep learning

Deep Learning by Goodfellow et al. defines ‘deep learning’ as algorithms enabling “the computer to learn complicated concepts by building them out of simpler ones”. This is implemented using neural networks that consist of hierarchically organized simple processing units, neurons. This field had a series of huge successes starting in ∼2012: classifying images, generating images, playing Go, writing text, folding proteins, etc. Known together as the ‘deep learning revolution’. This lecture is about feed-forward neural networks for image classification.

We have already spent one entire lecture talking about a neural network classifier!

A hidden layer

Logistic regression is a linear network that has an input layer and an output layer. We now add a hidden layer :

L = −

∑ i

[ yi log h ( x i ) + (1 − yi ) log

( 1 − h ( x i )

)]

Linear: h ( x ) = 1 1 + e − W^2 W^1 x^

Nonlinear: 1 1 + e − W^2 ϕ ( W^1 x )

Activation function

We want to use some nonlinear activation function ϕ that is easy to work with. The most common choice:

ϕ ( x ) = max(0 , x ).

Such neurons are called rectified linear units (ReLU).

Going deeper

What we had above was a shallow network. Here is a deeper one:

L = −

∑ i

[ yi log h ( x i ) + (1 − yi ) log

( 1 − h ( x i )

)]

h ( x ) = 1 1 + e

− W 4 ϕ

( W 3 ϕ

( W 2 ϕ ( W 1 x )

))

Gradient

Logistic regression gradient (see Lecture 5):

∇L = −

∑ ( yi − h ( x i )

) ∇( β ⊤ x i ) = −

∑ ( yi − h ( x i )

) x i.

The gradient for the deep network:

∇L = −

∑ ( yi − h ( x i )

) ∇

[ W 4 ϕ

( W 3 ϕ

( W 2 ϕ ( W 1 x i )

))] .

Let us write this term down with indices:

z =

∑ a

W 4 aϕ

[ ∑

W 3 abϕ

( ∑

W 2 bcϕ

( ∑ d

W 1 cdxid

))] .

Now we need to use the chain rule: if h ( x ) = f ( g ( x )), then h ′( x ) = f ′( g ( x )) g ′( x ).

Stochastic gradient descent

Using backpropagation, we can compute the gradient (partial derivatives with respect to each weight) and use gradient descent. Two notes:

In practice, gradient descent algorithm is often used with some modifications: momentum , adaptive learning rates (e.g. Adam), etc.
Gradient descent requires summation over all training samples at each step. In practice, training data are split into batches and are processed one ny one: stochastic gradient descent (SGD). One sweep through the entire training dataset is called an epoch.

Multiclass classification

If there are K classes, then the last softmax layer has K output neurons:

P ( y = k ) = e

zk ∑ i ezi^

where zi are pre-nonlinearity activations: z = W Lϕ

( W L − 1 ϕ (... )

) . The cross-entropy loss function can be written as

L = −

∑^ n i =

∑^ K

k =

Yik log P ( yi = k ) ,

where Yik = 1 if yi = k and 0 otherwise.

Convolutional neural networks:

The input image has three input channels:

Convolutional neural networks

Several feature maps :

What do CNNs learn?

First layer:

Krizhevsky et al. 2012

What do CNNs learn?

Hidden layer:

Olah et al. 2017

Overfitting and regularization

Ridge ( L 2 ) regularization: λ ∥ W l ∥^2 on each layer l. This is also called weight decay (see Lecture 4). Another method is called early stopping :

Remark: for linear regression one can show that early stopping penalizes smaller singular values stronger, as does the ridge penalty.

Overparametrization

Modern neural networks are typically used in the overparametrized regime, i.e. they can perfectly or near-perfectly overfit training data. This can be shown by training them using randomly shuffled labels: they can still achieve zero training loss. At the same time, generalization performance can be high: ‘benign’ overfitting due to implicit regularization. Sometimes one sees overfitting in the test loss but not in the test accuracy:

Neural networks and deep learning, Slides of Machine Learning

Related documents

Partial preview of the text

Download Neural networks and deep learning and more Slides Machine Learning in PDF only on Docsity!

Neural networks and deep learning

A hidden layer

L = −

Activation function

Going deeper

L = −

Gradient

Stochastic gradient descent

Multiclass classification

L = −

Convolutional neural networks:

Convolutional neural networks

What do CNNs learn?

What do CNNs learn?

Overfitting and regularization

Overparametrization