A CNN can learn the values of the filters on its own during the training process, as described herein. Typically, the more filters, the more image features get extracted and the better the CNN becomes at recognizing patterns or features in images. The size of a feature map can be controlled by parameters determined before the convolution is performed. These parameters can include the “depth” of, or number of filters used, for the convolution operation, which can be used to produce different feature maps. Feature maps may be envisioned as stacked 2D matrices of the image, so that a feature map using three filters would have a depth of three. Another parameter can be the “stride” value which is the is the number of pixels by which a filter slides over the image. Having a larger stride will produce smaller feature maps. Another parameter relates to “zero-padding,” which is a method to pad the input image with zeros around the border. Padding allows control of the size of the feature maps.
Non-linearity is another operation or layer that can be used in a CNN. This operator is used to introduce non-linearity in into a CNN model because most real-world images and image data are non-linear. In contrast, the convolution operation is linear and provides an element-wise matrix multiplication and addition. Accordingly, non-linearity can be introduced into the model via a non-linear function such as ReLU, Tan h, or Sigmoid to improve the accuracy of the prediction model. For example, ReLU stands for Rectified Linear Unit and is an element-wise operation (applied per pixel) and can replace all negative pixel values in the feature map with different values, such as a zero value. The output feature map of the ReLU function can be referred to as the ‘Rectified’ feature map.