What is RNN?
RNN is a type of supervised deep learning used for time series analysis. The basic structure of RNN looks like:
The hidden layers (the blue circles) not only give output but also feed back into themselves.
So, how does RNN update weights? Like ANN, RNN uses backpropagation.
However, there are some problems when updating weights ($W_{rec}$).
Because the weights $W_{rec}$ are usually less than 1, it takes many epochs to update the weights in the back. And also, as the weight goes to the backward input, the effect on the output becomes smaller.
This problem is called vanishing gradient problem. Conversely, if the weights $W_{rec}$ are more than 1, it is called exploding gradient problem..
Below, I will introduce ways to solve the vanishing and exploding gradient problems.
Structure of various RNNs
- One to Many
It can be used to generate sentences from a picture.
- Many to One
It can be used to classify whether the sentence is positive or negative.
- Many to Many
It can be used for translation.
Ways to solve vanishing and exploding
Vanishing
There are three ways to solve vanishing
- Weight initialization
- Echo state networks
- Long Short-Term Memory Networks(LSTM)
Exploding
- Truncated Backpropagation
- Penalties
- Gradient Clipping