In one embodiment, for each image or image data, a CNN can use four main operations (i.e., layers of the CNN), which include convolution, non-linearity, pooling, and classification. The convolution operation can extract features from an input image. Typically, convolution preserves the spatial relationship between pixels of an image by learning image features using small squares of input data from an image (such as pixels or groups of pixels of an image). The input data is taken from different portions (e.g., tiles or squares) of the original image where each input portion may be described as a “feature detector” (i.e., a “filter” or a “kernel”). The convolution operation applies (i.e., “slides”) the filter across the pixels of the original image to generate one or more respective “convolved features” (i.e., “activation maps” or “feature maps”) that describe the image. In this manner, the filters acts as feature detectors of the original input image, which may be used to determine items of interest.