In various embodiments, and with respect to blocks 804 and 814, an ensemble based model may obtain a “chunk” of one or more 2D or 3D images, such as a series or sets of 2D or 3D images from a certain timeframe (or time period) of a movie file or other set of related images. In certain embodiments, the size of the chunk (i.e., the number images to analyze in a particular timeframe, e.g., 20 frames per second over a 5 second timeframe) may be predetermined by the computing device or set by an operator of the computing device.
At block 815, the 2D and 3D images are standardized so that the potentially otherwise incompatible formats or file types of the 2D and 3D images can be compared for predictive purposes. For example, at block 806 a 2D image may be resized and normalized to a certain resolution. For example, in one embodiment, if a first group of the 2D image(s) obtained at block 804 were originally captured from a 5 megapixel camera device, then such 2D images would have 2560 pixels in the horizontal axis and 1920 pixels in the vertical axis. If a second group of the 2D images(s) obtained at block 804 were originally captured from a 3 megapixel camera device, then such then such 2D images would have 2048 pixels in the horizontal axis and 1536 pixels in the vertical axis. At block 806, each of the 5 megapixel 2D images and 3 megapixel 2D images may be downsized to generate a new images that represent the original images, but have only 640 (horizontal)×480 (vertical) pixels. In such a way, each of the 2D images obtained are resized and normalized into a common size and format, e.g., pixel resolution, for use in the ensemble prediction model.