In one embodiment particular (e.g. identified) features in the video image data and/or the sensor data are selected based on image recognition of these features, e.g. as well as selecting the features based on their three-dimensional positions. This may allow the participant(s) and their face(s) to be selected from the video image data and/or the sensor data.
When multiple virtual cameras are defined, different features may be selected to be shown from the perspective of the different virtual cameras respectively. For example, each virtual camera may be used to portray a single selected feature (e.g. a feature of a participant, such as their eyes, nose or mouth) from the perspective of that virtual camera.
Once the features falling within the volume have been selected for further processing, for example, the video image data and/or the sensor data (e.g. of the selected features) from the video camera(s) in the array are used (e.g. combined) to form a single, composite stream of video image data and/or sensor data which appears as having been captured from the perspective of the virtual camera. Thus preferably the method comprises (and the processing circuitry is configured to) combining the video image data from the one or more video cameras and/or the data captured by the one or more sensors to form the single view of the feature(s) as appearing to have been captured from the virtual camera. Preferably the video image data and/or the sensor data (e.g. of the selected features) are processed (e.g. combined) such that the face(s) and/or eye(s) and/or body of the participant(s) in the captured video image data and/or sensor data are oriented perpendicularly to the direction to them from the virtual camera.