Pooling is another operation or layer that can be used in a CNN. Pooling (i.e., also “subsampling” or “downsampling”) reduces the dimensions (e.g., number of pixel values) of each feature map but retains the most important information, such as the max, average, sum, etc. of the feature map. For example, in a max pooling embodiment, the largest element from a rectified feature map (e.g., the greatest value in a tile or group of pixels) may be identified and used as the representative value for the entire tile or group. In another embodiment, the average (Average Pooling) or sum of all elements in that group or tile could be used. In another embodiment, the pooling operation may use Distification, as describe herein, to determine the horizontal, vertical, or depth coordinates associated with a feature map and use any of the horizontal, vertical, or depth coordinates as the representative value for an entire tile or group.
Pooling reduces the spatial size of the input representation and provides several enhancements to the overall CNN model, including making the input representations (feature dimension) smaller and more manageable, reducing the number of parameters and computations in the network, therefore, controlling overfitting, and making the CNN resilient to small distortions and translations in the input image (e.g., because a small distortion in input will not change the maximum, average or Distified value of the output feature map). Thus, pooling allows detection of features, such as items of interest, in an image despite variances in images of a certain class.