What is claimed is:1. A method for video coding performed by at least one processor, the method comprising:obtaining video data;detecting at least one face from at least one frame of the video data;determining a set of facial landmark features of the at least one face from the at least one frame of the video data;determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;determining a set of EFA features from the EFA; andcoding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmark features, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,wherein the video data comprises an encoded bitstream of the video data,wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, andwherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.2. The method according to claim 1,wherein the at least one face from the at least one frame of the video data is determined to be a largest face among a plurality of faces in the at least one frame of the video data.3. The method according claim 1, further comprising:determining a plurality of sets of facial landmark features, other than the set of facial landmark features of the at least one face from the at least one frame of the video data, respect to each of the plurality of faces in the at least one frame of the video data; andcoding the video data at least partly by the neural network based on the determined set of facial landmark features and the determined plurality of sets of facial landmark features.4. The method according to claim 1,wherein the neural network comprises a deep neural network (DNN).5. An apparatus for video coding, the apparatus comprising:at least one memory configured to store computer program code;at least one processor configured to execute the computer program code to implement:obtaining video data;detecting at least one face from at least one frame of the video data;determining a set of facial landmark features of the at least one face from the at least one frame of the video data;determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;determining a set of EFA features from the EFA; andcoding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmark features, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,wherein the video data comprises an encoded bitstream of the video data,wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, andwherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.6. The apparatus according to claim 5,wherein the at least one face from the at least one frame of the video data is determined to be a largest face among a plurality of faces in the at least one frame of the video data.7. The apparatus according to claim 5,wherein the at least one hardware processor is further configured to execute the computer program code to implement:determining a plurality of sets of facial landmark features, other than the set of facial landmark features of the at least one face from the at least one frame of the video data, respect to each of the plurality of faces in the at least one frame of the video data; andcoding the video data at least partly by the neural network based on the determined set of facial landmark features and the determined plurality of sets of facial landmark features.8. A non-transitory computer readable medium storing a program causing a computer to execute a process, the process comprising:obtaining video data;detecting at least one face from at least one frame of the video data;determining a set of facial landmark features of the at least one face from the at least one frame of the video data;determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;determining a set of EFA features from the EFA; andcoding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmarks, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,wherein the video data comprises an encoded bitstream of the video data,wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, andwherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.