The predict actions from each of the 2D model and 3D model can then be returned to the ensemble model for further analysis. For example, the ensemble model's “predict” function may use the returned 2D and 3D predict actions to generate a predict data structure (e.g., such as a multi-dimensional array) that may include the 2D and 3D predict actions, real actions (if available), person index (e.g., if the driver identifiers were used), and a timestamp for each predict action corresponding to the timestamp of a the image of the 2D or 3D model prediction. The predict data structure can be, for example, a NumPy record array, which is multi-dimensional array of the Python programming language. However, other data structures, in other programming languages, (e.g., a multi-dimensional array in the Java or C# languages) can also be used. The predict data structure may also be sorted by any of the predict action, real action, person index, or timestamp values.
Once the ensemble model generates the predict data structure, the enhanced ensemble prediction may be generated (block 820), for example, by passing the predict data structure to the ensemble model's “ensemble” function. The ensemble function can analyze the 2D and 3D predict actions in the predict data structure to determine an enhanced prediction for each pair of corresponding 2D and 3D images (i.e., a “2D3D image pair”). The corresponding 2D3D image pair can be determined, for example, based on a 2D image and a 3D image having the same (or similar) timestamp, where the timestamps either have the same time value or a time value that differs, for example, by several seconds. The enhanced prediction can be based on probabilities of the classifications from each of the underlying 2D and 3D models of the ensemble model.