The depth component of the three-dimensional positions of the features is preferably determined by determining the displacement between (e.g. by triangulating the positions of) the features using the video image data and/or the other sensor data from the array of video camera(s) and/or sensor(s), or by using other methods to determine the depth component. For example, features with less distance between them may be determined to have a greater depth than features with greater distance between them (e.g. for video camera(s) and/or sensor(s) that have axes which are parallel or otherwise aligned), e.g. exploiting the perspective in the captured data. Thus preferably the method comprises (and the processing circuitry is configured to) determining the depth component of the three-dimensional position(s) of the feature(s), e.g. using the two-dimensional (e.g. x-y) distance between features in the video image data and/or the other sensor data. Determining the three-dimensional positions of the features in the video image data and/or the sensor data enables a depth to be assigned to each of the features (e.g. along with their two-dimensional position in the frames of video image data). The three-dimensional positions may be stored as a depth map, a 3D point cloud, a 3D mesh or a depth buffer, for example.