In some embodiments, various multiples or permutations or numbers of the convolution, non-linearity, and pooling layers may be used for a CNN model. For example, in one embodiment, a 128×96 pixel image may be used as input for the model. A first convolution operation may include applying 32 3×3 filters to determine the edges of the image. A max pooling operation may analyze 2×2 tile portions of the of the output of the first convolution operation to determine the maximum value of each tile portion. A ReLU function may then be applied to the pooled image data to provide non-linearity to pooled image data. A second convolution function may then be applied, for example, 64 3×3 filters to determine the interior features of the image. Together these operations can extract the useful features from the images (e.g., items of interest), introduce non-linearity in the CNN model, and can reduce feature dimension to enhance computing performance. The above operations can be repeated any number of times for a single CNN. For example, some CNN may have tens of convolution and pooling layers. In addition, the ordering of the convolution, non-linearity, and pooling operations may differ. For example, it is not necessary to have a pooling operation after every convolutional operation.