Preferably the method comprises (and the processing circuitry is configured to) identifying features in the video image data and/or the other sensor data, e.g. contained in the entire three-dimensional representation of the scene. The features are preferably each assigned a respective three-dimensional position. The features may, for example, be discrete features (such as people and pieces of furniture, or sub-components thereof) or may be defined in a more abstract manner, e.g. relating to their position in the video image data or other sensor data. Preferably the features comprise at least some of the (e.g. parts of the) participants of the first party.
The features whose positions are determined may be identified in the video image data or other sensor data in any suitable and desired way. In one embodiment the features are identified using image recognition, e.g. to recognise people (and, e.g., their faces), furniture, etc. (or sub-components thereof). In one embodiment, the features are identified using feature recognition, e.g. by looking for areas in the video image data or other sensor data having high contrast. This may involve identifying parts of the image that have clearly identifiable features or that have a high degree of “uniqueness” to them. This helps to identify the borders of features and may, for example, identify features (e.g. in an abstract manner) without having to perform detailed image recognition.