What is Contrastive Divergence?

Because RBM is a undirected neural network, contrastive divergence is devised to adjust weights. Here are the input nodes and hidden nodes for example:

RBM calculates the hidden nodes through initially randomly assigned weights. The hidden nodes then reconstruct the input nodes.

It is important that the weights remain the same as the initial state while the input nodes differ from the initial state. Why are the input nodes different? It’s because the input nodes are not connected to each other. For example, we are looking at 2nd visible node.

Because the hidden nodes are affected not only by the 2nd visible node but also by other visible nodes, the hidden nodes do not perfectly fit the 2nd visible node. And of course, the visible nodes that are reconstructed by hidden nodes are different even though the weights are the same. This is why contrastive divergence exists, and why the two visible nodes are different.

RBM performs contrastive divergence until the reconstructed visible nodes are the same as before. This process is called Gibbs sampling.

How to adjust weights in contrastive divergence

First, let’s say that the graph represents the randomly initialized weight

We can understand through the formula in the image above how weights affect the $log$ probability.

IIn RBM, energy is defined through weights. That is, weights dictate the shape of energy. So, let’s see what happens during the steps of the contrastive divergence.

First, let’s say the green dot represents the first weights which is randomly initialized.

And also, the red dot represents the reconstructed visual node’s weights.

Through contrastive divergence, the dot moves toward the lowest energy.

At the end of contrastive divergence, the location of dot will be:

However, in Hinton’s shortcut, just two steps are enough to understand how to adjust the curve at an early stage without waiting for the data to converge. To do this, adjust the weights so that the green dot has the lowest energy like:

And the result will be:

Contrastive Divergence

What is Contrastive Divergence?

How to adjust weights in contrastive divergence

Further Reading

Contractive Autoencoder

Stacked Autoencoder

Deep Autoencoder