The captured image and sound data is passed from the video cameras 32, 42 and microphones 36, 46 to the respective computers 39, 49 where it is analysed by the respective processors 40, 50 (step 102, FIG. 7). The analysis of the video image data captured by the video cameras 32, 42 enables features (e.g. of the user's faces and bodies) to be identified using feature recognition (e.g. by finding points in the video image data containing high contrast).
The three-dimensional (3D) positions of the features captured in the video image data are also determined for each of the video cameras 32, 42, using triangulation between the different video cameras 32, 42 in each array (step 103, FIG. 7). Using this determination of the 3D positions, a depth (z) position is then assigned to each point of each image captured by the video cameras 32, 42.
Using the feature recognition of the video image data, the respective processors 40, 50 determine a location at which to position a virtual camera and the direction in which it should be pointed (step 104, FIG. 7). For example, bodies, faces and/or eyes of users that have been identified in the video image data captured by the video cameras 32, 42 are used to determine the location and direction of the virtual camera. The video image data that is eventually sent to the other party on the video conferencing call will appear to come from the perspective of the virtual camera.