What is Gradient Descent?
Gradient Descent is a method to minimize the cost $C$. The formula for the cost $C$ is:
The graph of $C$ is shown below.
Let’s assume that the initial value of $C$ is at the red point.
We are going to calculate the gradient of $C$. As you can see, the red point is going downhill. After that, the red point will roll naturally and stop at:
By repeating the process above until $C$ is minimized, you will get well-fitted weights.
What is Stochastic Gradient Descent?
Stochastic Gradient Descent compensates for the disadvantage of Gradient Descent, which cannot be implemented when there is no single convex region.
Gradient Descent trains on the entire dataset and then updates the weights, whereas Stochastic Gradient Descent updates the weights each time a row of the dataset is trained, as follows: