As described below, for a new image, the output probabilities are calculated using the weights which have been optimized to correctly classify all the previous training examples. For example, with respect driver behavior detection, in one embodiment, training and testing a CNN may include taking a large image data set, such as 72,000 images of 81 drivers. The input data set may be split into training and validation sets (e.g., a total of 67,000 images of 75 drivers) and a test set (e.g., 5,000 images of 6 drivers). The filters and weights of a raw CNN model may first be initialized with random values. Using the test set (e.g., 60,000 images of 71 drivers) as input, the CNN may then be forward propagated by applying the training set to the convolution, ReLU, pooling, and fully connected operations to determine output probabilities for each of a number of classifications. For example, the output probabilities for the classes “safe driving,” “texting,” and “calling” could be 0.6, 0.1, 0.3, respectively. Since the weights were randomly assigned in the first instance, the output probabilities would also be random and would likely contain error. At this point, a “backpropagation” technique can be used to calculate the error rates with respect to all weights in the network. Accordingly, an error rate for each of the random output probabilities can be determined by comparing, for each image, the predicted class to the actual class that the image belongs to. A total error of the model may then be computed based on the various error rates. All filter values and weights are updated in the CNN to minimize the total output error. The weights and other values can be adjusted in proportion to their contribution to the total error to minimalize the total error of the model.