In the same or another embodiment, a picture may consist of one or more foreground sub-pictures with or without a background sub-picture. A sub-picture in a layer a may be further partitioned into multiple sub-pictures in the same layer. One or more coded sub-pictures in a layer b may reference the partitioned sub-picture in a layer a.
In the same or another embodiment, a coded video sequence (CVS) may be a group of the coded pictures. The CVS may consist of one or more coded sub-picture sequences (CSPS), where the CSPS may be a group of coded sub-pictures covering the same local region of the picture. A CSPS may have the same or a different temporal resolution than that of the coded video sequence.
In the same or another embodiment, a CSPS may be coded and contained in one or more layers. A CSPS may consist of one or more CSPS layers. Decoding one or more CSPS layers corresponding to a CSPS may reconstruct a sequence of sub-pictures corresponding to the same local region.
In the same or another embodiment, the number of CSPS layers corresponding to a CSPS may be identical to or different from the number of CSPS layers corresponding to another CSPS.
In the same or another embodiment, a CSPS layer may have a different temporal resolution (e.g. frame rate) from another CSPS layer. The original (uncompressed) sub-picture sequence may be temporally re-sampled (up-sampled or down-sampled), coded with different temporal resolution parameters, and contained in a bitstream corresponding to a layer.