Home Pooling
Post
Cancel

Pooling

What is Pooling?

Without pooling, machines cannot recognize objects that differ only in angle or saturation as the same object. To solve this problem, we add a pooling layer in a CNN.

Assume that we use max pooling to perform pooling on the feature map created with the convolutional layer.

feature map:



We use a $2 \times 2$ pixel box for pooling. From the top left pixel of the feature map, $2 \times 2$ pixel boxes are extracted in order without overlapping.
Max pooling involves finding and recording the maximum value in each pixel box.



It is important to note that pooling proceeds without concern even if the pixel box exceeds the range of the feature map, as shown below:



The shape of the return will be (N, $C_{out}$, $H_{out}$, $W_{out}$), where:

$H_{out} = \left\lfloor \frac{H_{in}\ +\ 2\ \times\ padding[0]\ -\ dilation[0]\ \times\ (kernelsize[0]\ -\ 1)\ -\ 1}{stride[0]} \right\rfloor\ + 1$

$W_{out} = \left\lfloor \frac{W_{in}\ +\ 2\ \times\ padding[1]\ -\ dilation[1]\ \times\ (kernelsize[1]\ -\ 1)\ -\ 1}{stride[1]} \right\rfloor\ + 1$
1
2
3
4
def cnn_calculator(w_in, h_in, kernel_size, padding=(0,0), stride=(1,1), dilation=(1,1)):
  w_out = math.floor((w_in + 2*padding[0] - dilation[0]*(kernel_size[0] - 1) - 1) / stride[0]) + 1
  h_out = math.floor((h_in + 2*padding[1] - dilation[1]*(kernel_size[1] - 1) - 1) / stride[1]) + 1
  return w_out, h_out
This post is licensed under CC BY 4.0 by the author.