Following this, when multiple virtual cameras 71 are being used, the multiple depth buffers are merged (step 108, FIG. 7). When the multiple depth buffers are merged, a volumetric stage is applied to the captured video image data (or as a separate step when there is only a single virtual camera 61 and so only a single depth buffer), (step 108, FIG. 7).
The step of applying a volumetric stage 52 is shown in FIG. 5. Using the 3D positions assigned to the features captured in the video image data, the volumetric stage 52 identifies and extracts the features that fall within the volumetric stage 52. All the rest of the captured video image data, corresponding to features that have been identified as lying outside of the volumetric stage 52, are discarded. Applying the volumetric stage 52 means that only the video image data that is of interest in being viewed by the other parties of the video conferencing call (i.e. the video image data of the participants within the volumetric stage 52 in the room 51) needs to be processed further for transmitting to these other parties.