What is Convolution Operation?

The formula of convolution is:

$(f \times g)(t)\ =\ \int_{-\infty}^{\infty}f(\tau)g(t\ -\ \tau)d\tau$

Convolution is the simple application of a filter, called a feature detector, to an input that results in an activation.
Let’s assume we have input image data:

And a feature detector:

We will create a feature map by multiplying the matrix extracted from the input image by the feature detector as follows:

Continuing this, we get the feature map:

And the shape of the return will be (N, $C_{out}$, $H_{out}$, $W_{out}$), where:

$H_{out} = floor(\frac{H_{in}\ +\ 2\ \times \ \text{padding[0]}\ -\ \text{dilation[0]}\ \times \ (\text{kernelsize[0]}\ -\ 1)\ -\ 1}{\text{stride[0]}})\ + 1$

$W_{out} = floor(\frac{W_{in}\ +\ 2\ \times \ \text{padding[1]}\ -\ \text{dilation[1]}\ \times \ (\text{kernelsize[1]}\ -\ 1)\ -\ 1}{\text{stride[1]}})\ + 1$

  
def cnn_calculator(w_in, h_in, kernel_size, padding=(0,0), stride=(1,1), dilation=(1,1)):
  w_out = math.floor((w_in + 2*padding[0] - dilation[0]*(kernel_size[0] - 1) - 1) / stride[0]) + 1
  h_out = math.floor((h_in + 2*padding[1] - dilation[1]*(kernel_size[1] - 1) - 1) / stride[1]) + 1
  return w_out, h_out

Why we use feature detector?

By multiplying the input image and the feature detector, we can reduce the size of the image.
To detect some features in certain parts of the integrated image. For example, if the feature detector has pattern on it, the highest value in feature map is when that pattern is matched up.

Is there only one feature map in the convolutional layer?

No, unlike the example, there are many feature maps in a convolutional layer.

Because an image has lots of features, there is no way to represent an image with a single feature map. Each feature map represent one of the features in the image. For example, there are some images of Taj Mahal that have been feature extracted.
Blur version:

Edge detected version:

Then, why do we have lots of versions? Because computer can choose the optimal version to carry out the command.

ReLU layer?

In ReLU layer, apply rectifier function($\phi(x)\ =\ max(x,\ 0)$) to break linearity of convolutional layer.The reason why we want to break linearity is that images are non-linear. However, if you create a feature map by applying a convolution operation and activating the feature detector, there is a risk of creating linearity. That’s why ReLu layer is applied.

Convolution Operation

What is Convolution Operation?

Why we use feature detector?

Is there only one feature map in the convolutional layer?

ReLU layer?

Further Reading

Contractive Autoencoder

Stacked Autoencoder

Deep Autoencoder