



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
UNDERSTANDING GRADIENT DESCENT AND BACK-PROPAGATION
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!
To define the relationship between Gradient decent and back-propagation we must first understand what gradient decent is: Let us consider a single layered neural network: Here the X1 and X2 represent input layer often called layer 0, and 𝑦̂ as output layer. Thus, this is a neural network with just one hidden layer which can also be expressed as single layered neural network or single perceptron network. Let us see the expression of the expected output 𝑦̂ The 𝑦̂ is computed as the dot product of weight matrix with the input x with the addition of bias b
Where w is the weight matrix and b being the bias of the network. This equation is supposed to be a linear function as we are using just one perceptron thus its working is same as of linear regression or we can say this is the working of a linear regression. Since our neural network is not bounded to just linear data or expression, we must use a non-linear activation function to increase the accuracy and let our neural network to work through the non- linearity of the problems. They let 𝜎 be the activation function thus our expression for the neural network will become:
Let us w X + b be z. Then the expression changes as:
Here this 𝜎 can be a ReLu, sigmoid, tanh function. 𝑦̂ (i)^ = 𝜎 (z(i)) Let us formulate a cost function (skipping the derivation):
𝑚 𝑖= 1
y
And the thus gradient decent tries to minimise the loss thus finding the global minima of the function and for that it takes small-small away from the weigh thus the expression of the gradient decent has subtraction of a small unit from the weight w. And tries to reach the global minima.
Now let us understand about back propagation. Let us consider a neural network with one perceptron and one input x: So, if we move around J considering it to be very small, how will that effect our loss/cost function this will what the gradient defines here. Appling chain rule:
𝝏𝑱(𝑾) 𝝏𝒘𝟐
𝝏𝑱(𝑾) 𝝏𝒚̂
𝝏𝒚̂ 𝝏𝒘𝟐
𝝏𝑱(𝑾) 𝝏𝒘𝟏
𝝏𝑱(𝑾) 𝝏𝒚̂
𝝏𝒚̂ 𝝏𝒛𝟏
𝝏𝒚̂ 𝝏𝒘𝟏 X y
z1 (^) J(W) Fig : GD- 02 (credit MIT COURSE NUMBER 6S191)
Since the w1 is expanded even further thus partial derivative for z1 will also be considered. And this is done for just that simple neural network we considered so if we consider a large deep neural network the same can be repeated recursively. This is what the expression of back- propagation is. Theoretically if we say, The relationship between gradient decent and back-propagation is: