The invention claimed is:1. A method of capturing data for use in a video conference, the method comprising:capturing data of a party at a first location using an array of one or more video cameras and/or one or more sensors;wherein the one or more video cameras and/or one or more sensors in the array are located in the same plane;wherein the field of view of the one or more video cameras and/or one or more sensors in the array is directed outwards perpendicularly to the plane in which they are located; anddetermining, for each of the one or more video cameras and/or each of the one or more sensors in the array, the three-dimensional position(s) of one or more features represented in the data captured by the video camera or sensor;defining a virtual camera positioned at a three-dimensional virtual camera position;transforming the three-dimensional position(s) determined for the feature(s) represented in the data into a common coordinate system to form a single view of the feature(s) as appearing to have been captured from the virtual camera using the video image data from the one or more video cameras and/or the data from the one or more sensors;transmitting and/or storing the video image and/or sensor data of the feature(s) viewed from the perspective of the virtual camera and/or data representative of the transformed three-dimensional position(s) of the feature(s); andwherein the method further comprises determining a depth component of the three-dimensional position(s) of the feature(s) and transforming the image data and/or the sensor data of the feature(s) into the common coordinate system using an xy translation inversely proportional to the determined depth of the feature(s).2. The method as claimed in claim 1, further comprising selecting the feature(s) in the video image and/or sensor data having transformed three-dimensional position(s) in the common coordinate system that are within a particular range of three-dimensional positions.3. The method as claimed in claim 1, wherein a depth component of the three-dimensional position(s) of the feature(s) is determined by triangulating the positions of the feature(s) using the video image data from the video camera(s) and/or the sensor data from the sensor(s).4. The method as claimed in claim 1, wherein the method comprises calibrating the positions of the video camera(s) and/or sensor(s) in the array of video camera(s) and/or sensor(s).5. The method as claimed in claim 1, wherein the method comprises identifying feature(s) in the video image data and/or the other sensor data captured by the array of video camera(s) and/or sensor(s).6. The method as claimed in claim 5, wherein the step of identifying feature(s) in the video image data or other sensor data comprises identifying feature(s) in one or more blocks of the video image data and/or the other sensor data.7. The method as claimed in claim 1, wherein the method comprises identifying participant(s) of the first party in the video image and/or sensor data captured by the array of video camera(s) and/or sensor(s).8. The method as claimed in claim 7, wherein the virtual camera is positioned using the participant(s) of the first party identified in the captured video image and/or sensor data and/or the direction in which the participant(s) are looking or facing.9. The method as claimed in claim 5, wherein the method comprises comparing one or more identified features or participants in the video image data and/or other sensor data from one of the video camera(s) and/or sensor(s) in the array with one or more identified features or participants in the video image data and/or other sensor data from other(s) of the video camera(s) and/or sensor(s) in the array, and matching the same or similar identified features or participants with each other.10. The method as claimed in claim 9, wherein the method comprises matching the video image data and/or other sensor data from one or more pairs of video camera(s) and/or sensor(s) in the array.11. The method as claimed in claim 9, wherein the method comprises forming a depth map, a 3D point cloud, a 3D mesh or a depth buffer for each pair of video camera(s) and/or sensor(s) in the array between which identified feature(s) have been matched and storing the determined three-dimensional position(s) of the identified and matched feature(s) in the depth map, 3D point cloud, 3D mesh or depth buffer.12. The method as claimed in claim 9, wherein the method comprises using the video image data and/or sensor data from other(s) of the video camera(s) and/or sensor(s) in the array to refine the three-dimensional position(s) of the identified and matched feature(s).13. The method as claimed in claim 1, the method further comprising defining a plurality of virtual cameras positioned at respective three-dimensional virtual camera positions.14. The method as claimed in claim 1, the method further comprising filling a depth buffer with a transformed depth position of each of the features represented in the video image and/or sensor data.15. The method as claimed in claim 1, wherein the single view of the feature(s) is formed such that the face(s) and/or eye(s) and/or body of the participant(s) in the video image and/or data are oriented perpendicularly to the direction to them from the virtual camera.16. The method as claimed in claim 1, wherein the video image and/or sensor data from the array of video camera(s) and/or sensor(s) of the selected feature(s) are combined by forming a triangulated mesh, point cloud or depth buffer of the feature(s); and wherein the triangulated mesh, point cloud or depth buffer of the selected feature(s) is filled with image and/or sensor data of the selected feature(s) from the video camera(s) and/or sensor(s) in the array.17. The method as claimed in claim 1, wherein the method comprises combining the video image data from the one or more video cameras and/or the data captured by the one or more sensors to form the single view of the feature(s) as appearing to have been captured from the virtual camera; and wherein the method comprises averaging the colour data from the one or more video cameras and/or the data captured by the one or more sensors to form the single view of the feature(s) as appearing to have been captured from the virtual camera.18. A video conferencing system for capturing data for use in a video conference, the system comprising:an array of one or more video cameras and/or one or more sensors for capturing data of a party at a first location;wherein the one or more video cameras and/or one or more sensors in the array are located in the same plane;wherein the field of view of the one or more video cameras and/or one or more sensors in the array is directed outwards perpendicularly to the plane in which they are located; andprocessing circuitry configured to:determine, for each of the one or more video cameras and/or each of the one or more sensors in the array, the three-dimensional position(s) of one or more features represented in the data captured by the video camera or sensor;define a virtual camera positioned at a three-dimensional virtual camera position;transform the three-dimensional position(s) determined for the feature(s) represented in the data into a common coordinate system to form a single view of the feature(s) as appearing to have been captured from the virtual camera using the video image data from the one or more video cameras and/or the data from the one or more sensors; andtransmit and/or store the video image and/or sensor data of the feature(s) as viewed from the perspective of the virtual camera(s) and/or data representative of the transformed three-dimensional position(s) of the feature(s); andwherein the processing circuitry is further configured to determine a depth component of the three-dimensional position(s) of the feature(s) and transform the image data and/or sensor data of the feature(s) into the common coordinate system using an xy translation inversely proportional to the determined depth of the feature(s).19. A non-transitory computer readable storage medium storing computer software code which when executing on a data processing system performs a method of capturing data for use in a video conference, the method comprising:determining, for each of one or more video cameras and/or one or more sensors in an array, the three-dimensional position(s) of one or more features represented in data of a party at a first location captured by the video camera or sensor;wherein the one or more video cameras and/or one or more sensors in the array are located in the same plane;wherein the field of view of the one or more video cameras and/or one or more sensors in the array is directed outwards perpendicularly to the plane in which they are located; anddefining a virtual camera positioned at a three-dimensional virtual camera position;transforming the three-dimensional position(s) determined for the feature(s) represented in the data into a common coordinate system to form a single view of the feature(s) as appearing to have been captured from the virtual camera using the video image data from the one or more video cameras and/or the data from the one or more sensors;transmitting and/or storing the video image and/or sensor data of the feature(s) viewed from the perspective of the virtual camera and/or data representative of the transformed three-dimensional position(s) of the feature(s) and;wherein the method further comprises determining a depth component of the three-dimensional position(s) of the feature(s) and transforming the image data and/or the sensor data of the feature(s) into the common coordinate system using an xy translation inversely proportional to the determined depth of the feature(s).