Preferably the comparison of identified features in the video image data and/or other sensor data from different video camera(s) and/or sensor(s) in the array takes into account the scale and rotation of the identified features, e.g. owing to an identified feature appearing differently depending on the relative location of the video camera(s) and/or sensor(s).
The matching of identified features in the video image data and/or other sensor data is preferably performed for the video image data and/or other sensor data from one or more pairs of video camera(s) and/or sensor(s) in the array. Matched features (e.g. that pass the threshold applied to the metric) are deemed a pair (and the data flagged or stored as such). Identified features that are not matched, or are matched with two or more other identified features, may be stored for later use.
At this stage, preferably a depth map, a 3D point cloud, a 3D mesh or a depth buffer is created for each pair of video camera(s) and/or sensor(s) in the array, e.g. between which identified feature(s) have been matched, for storing the (e.g. depth component of the) determined three-dimensional position(s) of the identified and matched feature(s). As outlined above, preferably the depth component of the three-dimensional position(s) of the identified and matched feature(s) is determined by determining the displacement between (e.g. by triangulating the positions of) the features using the video image data and/or the other sensor data from the array of video camera(s) and/or sensor(s).