As S101, given an input video sequence X=x1, x2, . . . , such as data 121, a Face Detection & Facial Landmark Extraction module 122 first, at S102 determines one or a plurality of valid faces from each video frame xi. In one embodiment, only the most prominent (e.g., largest) face are detected, and in another embodiment, all faces in the frame satisfying a condition (e.g., with a large enough size exceeding a threshold) are detected. At S103, for the j-th face in xi, a set of facial landmarks are determined and a set of facial landmark features fl,i,j, are computed correspondingly, which will be used by the decoder for restoring the j-th face in xi. At S103, all facial landmark features of all faces are put together as Fl,i=fl,i,1, fl,i,2, . . . , data 124, which is encoded and transmitted by a Landmark Feature Compression & Transmission module 126. At S105, at the same time, for the j-th face in xi, an Extended Face Area (EFA) can be computed by extending the bounding area (boundary as a rectangle, eclipse, or fine-grained segmentation boundary) of the original detected face to include additional hair, body parts, or even backgrounds. At S106 and S107, a set of EFA features fb,i,j can be computed correspondingly, which will be used by the decoder for restoring the EFA of the j-th face in xi. At S107, all EFA features of all faces are put together as Fb,i=fb,i,1, fb,i,2, . . . , data 125, which is encoded and transmitted by an EFA Compression & Transmission module 127.