When, as outlined below, features falling outside a particular volume are discarded, this may be performed simply using the xy coordinates of the common coordinate system, once the transformation has been performed, e.g. owing to these features being outside of the viewing frustum of the virtual camera. Furthermore, features that obscure each other owing to having the same xy coordinate but different z coordinates (e.g. following transformation) may be identified and the features appearing further away from the virtual camera may be discarded, e.g. such that only the one that is closest to the virtual camera is retained.
Preferably a depth (z) buffer (e.g. in the coordinate system of the virtual camera) is defined and filled with the (e.g. transformed) depth (z) position of each of the features represented in the video image data and/or the sensor data. If any (e.g. depth) data is missing at this stage for any of the features represented in the video image data and/or the sensor data, preferably this data is interpolated from the data which is present.
When a plurality of virtual cameras have been defined, preferably a separate depth buffer is defined and filled for each virtual camera.
Using the transformed three-dimensional position(s) in the common coordinate system of the feature(s) in the video image data and/or sensor data, preferably the method comprises (and the processing circuitry is configured to) selecting the feature(s) in the video image data and/or the sensor data having transformed three-dimensional position(s) in the common coordinate system that are within a particular range of three-dimensional positions. Thus a three-dimensional volume is set and features falling within this volume are selected.