What is Policy Gradient? Policy Gradient is used to update the actor model using the Q-values optimized by the critic model. The goal of the policy $\pi_{\phi}$ gradient is to optimize the expecte...
Markov Decision Process(MDP)
What is MDP? Since few things are 100% certain in the real world, a more advanced formula than the Bellman Equation is needed. For this reason, MDP is devised. Assume there is a case where you nee...
Bellman Equation
What is Bellman Equation The Bellman Equation expresses the value of a decision problem at a certain point in time in terms of the payoff from initial choices and the value of the remaining decisio...
Reinforcement Learning
What is Reinforcement Learning? In reinforcement learning, an agent takes action in the environment, the environment changes the state, and the agent gets rewards, like: This process is repeat...
Q Learning
What is Q Learning? Q-Learning is almost the same as the MDP. One key difference from MDP is that Q-Learning applies Temporal Difference to make changes happen gradually. Why do the changes happen ...
Living Penalty
What is Living Penalty? A living penalty is applied to avoid infinite loops in Reinforcement Learning. Let’s assume that we are trying to find a way out of a maze: At first, we set the living ...
Softmax
What is Softmax? The Softmax function converts predicted values to be between 0 and 1, ensuring that the sum of all predicted values is 1. For example, consider a CNN with predicted values. As ...
Deep Autoencoder
What is Deep Autoencoder? A deep autoencoder is a stack of RBMs.
Stacked Autoencoder
What is Stacked Autoencoder? A Stacked Autoencoder is like an autoencoder with an additional hidden layer. That is, it involves two-step encoding and one-step decoding.
Contractive Autoencoder
What is Contractive Autoencoder? A Contractive Autoencoder is a type of autoencoder designed to solve the problem of having more hidden nodes than input nodes. A Contractive Autoencoder uses the en...