According to an embodiment, decoding one or more indications for a first type of pictures, such as for RAP pictures, and decoding one or more indications for a second type of pictures, such as for non-RAP pictures.
According to an embodiment, decoding indications separately for different types of scalability, different sets of scalability layers, and/or different sets of temporal sub-layers.
An apparatus according to a sixth embodiment comprises: a video decoder configured for decoding a bitstream comprising a base layer, a first enhancement layer and a second enhancement layer, the video decoder being configured for interpreting, from the bitstream, an indication indicating both the base layer and the first enhancement layer used for prediction for the second enhancement layer; interpreting, from the bitstream, an indication of a first set of prediction types that is applicable from the base layer to the second enhancement layer, wherein the first set of prediction types is a subset of all prediction types available for prediction between layers; interpreting, from the bitstream, an indication of a second set of prediction types that is applicable from the first enhancement layer to the second enhancement layer, wherein the second set of prediction types is a subset of all prediction types available for prediction between layers; and decoding said second enhancement layer using only said first set of prediction types from the base layer and said second set of prediction types from the first enhancement layer.