In the same or another embodiment, FIG. 12 shows an example of video conference based on the multi-layered sub-picture method. In a video stream, one base layer video bitstream corresponding to the background picture and one or more enhancement layer video bitstreams corresponding to foreground sub-pictures are contained. Each enhancement layer vide bitstream is corresponding to a CSPS layer. In a display, the picture corresponding to the base layer is displayed by default. It contains one or more user's picture in a picture (PIP). When a specific user is selected by a client's control, the enhancement CSPS layer corresponding to the selected user is decoded and displayed with the enhanced quality or spatial resolution. FIG. 13 shows the diagram for the operation.
In the same or another embodiment, a network middle box (such as router) may select a subset of layers to send to a user depending on its bandwidth. The picture/subpicture organization may be used for bandwidth adaptation. For instance, if the user doesn't have the bandwidth, the router strips of layers or selects some subpictures due to their importance or based on used setup and this can be done dynamically to adopt to bandwidth.