Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

UNDERSTANDING GRADIENT DESCENT AND BACK-PROPAGATION, Assignments of Machine Learning

UNDERSTANDING GRADIENT DESCENT AND BACK-PROPAGATION

Typology: Assignments

2020/2021

Uploaded on 05/20/2021

SHREYANSH-SINHA
SHREYANSH-SINHA 🇮🇳

5

(1)

1 document

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
DEEP-LEARNING
pf3
pf4
pf5

Partial preview of the text

Download UNDERSTANDING GRADIENT DESCENT AND BACK-PROPAGATION and more Assignments Machine Learning in PDF only on Docsity!

DEEP-LEARNING

To define the relationship between Gradient decent and back-propagation we must first understand what gradient decent is: Let us consider a single layered neural network: Here the X1 and X2 represent input layer often called layer 0, and 𝑦̂ as output layer. Thus, this is a neural network with just one hidden layer which can also be expressed as single layered neural network or single perceptron network. Let us see the expression of the expected output 𝑦̂ The 𝑦̂ is computed as the dot product of weight matrix with the input x with the addition of bias b

𝑦̂ = W X + b

Where w is the weight matrix and b being the bias of the network. This equation is supposed to be a linear function as we are using just one perceptron thus its working is same as of linear regression or we can say this is the working of a linear regression. Since our neural network is not bounded to just linear data or expression, we must use a non-linear activation function to increase the accuracy and let our neural network to work through the non- linearity of the problems. They let 𝜎 be the activation function thus our expression for the neural network will become:

𝑦̂ = 𝜎 (W X + b)

Let us w X + b be z. Then the expression changes as:

𝑦̂ = 𝜎 (z)

Here this 𝜎 can be a ReLu, sigmoid, tanh function. 𝑦̂ (i)^ = 𝜎 (z(i)) Let us formulate a cost function (skipping the derivation):

Cost function = J (W, b) = 1/m * ∑^ 𝑙(𝑦̂ , y)

𝑚 𝑖= 1

X

X

y

^

W

W

And the thus gradient decent tries to minimise the loss thus finding the global minima of the function and for that it takes small-small away from the weigh thus the expression of the gradient decent has subtraction of a small unit from the weight w. And tries to reach the global minima.

Thus, this small step taken by gradient decent is the learning rate 𝛼.

Now let us understand about back propagation. Let us consider a neural network with one perceptron and one input x: So, if we move around J considering it to be very small, how will that effect our loss/cost function this will what the gradient defines here. Appling chain rule:

For w2:

𝝏𝑱(𝑾) 𝝏𝒘𝟐

𝝏𝑱(𝑾) 𝝏𝒚̂

𝝏𝒚̂ 𝝏𝒘𝟐

Now for w1:

𝝏𝑱(𝑾) 𝝏𝒘𝟏

𝝏𝑱(𝑾) 𝝏𝒚̂

𝝏𝒚̂ 𝝏𝒛𝟏

𝝏𝒚̂ 𝝏𝒘𝟏 X y

W1 W2 ^

z1 (^) J(W) Fig : GD- 02 (credit MIT COURSE NUMBER 6S191)

Since the w1 is expanded even further thus partial derivative for z1 will also be considered. And this is done for just that simple neural network we considered so if we consider a large deep neural network the same can be repeated recursively. This is what the expression of back- propagation is. Theoretically if we say, The relationship between gradient decent and back-propagation is:

  • As we saw from the expression, we can consider the backpropagation as a subset of Gradient decent, which is the implementation of gradient descent in multi-layer neural networks.
  • Since the same training rule recursively occurs in each layer of the neural network, it can calculate as the contribution of each weight to the total error inversely from the output layer to the input layer. References: -

 Massachusetts Institute of Technology (MIT) DEEP-LEARNING course

number 6S191 – https://ocw.mit.edu/courses/electrical-engineering-and-

computer-science/6-s191-introduction-to-deep-learning-january-iap-2020/#

 DEEPLEARNING.AI course from COURSERA –

https://www.coursera.org/learn/neural-networks-deep-learning