白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Framework for video conferencing based on face restoration

專利號
US11659193B2
公開日期
2023-05-23
申請人
TENCENT AMERICA LLC(US CA Palo Alto)
發(fā)明人
Wei Jiang; Wei Wang; Shan Liu
IPC分類
H04N19/29; H04N19/30; H04N19/85; H04N19/17; G06T3/40; G06T7/62; G06V40/16; G06N3/045
技術(shù)領(lǐng)域
video,efa,facial,prediction,coded,landmark,intra,in,picture,be
地域: CA CA Palo Alto

摘要

There is included a method and apparatus comprising computer code configured to cause a processor or processors to perform obtaining video data, detecting at least one face from at least one frame of the video data, determining a set of facial landmark features of the at least one face from the at least one frame of the video data, and coding the video data at least partly by a neural network based on the determined set of facial landmark features.

說明書

CROSS REFERENCE TO RELATED APPLICATION

The present application claims priority to provisional application U.S. 63/134,522 filed on Jan. 6, 2021 which is hereby expressly incorporated by reference, in its entirety, into the present application.

BACKGROUND 1. Field

This disclosure relates to video conferencing involving face restoration (or face hallucination) that may recover realistic details from a real low-quality (LQ) face to a high-quality (HW) one based on landmark features.

2. Description of Related Art

The international standardization organizations ISO/IEC/IEEE are actively searching for AI-based video coding technologies, especially focusing on technologies based on Deep Neural Networks (DNNs). Various AhGs have been formed to investigate Neural Network Compression (NNR), Video Coding for Machine (VCM), Neural Network-based Video Coding (NNVC), etc. The Chinese AITISA and AVS also established corresponding expert groups to study standardization of similar technologies.

權(quán)利要求

1
What is claimed is:1. A method for video coding performed by at least one processor, the method comprising:obtaining video data;detecting at least one face from at least one frame of the video data;determining a set of facial landmark features of the at least one face from the at least one frame of the video data;determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;determining a set of EFA features from the EFA; andcoding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmark features, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,wherein the video data comprises an encoded bitstream of the video data,wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, andwherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.2. The method according to claim 1,wherein the at least one face from the at least one frame of the video data is determined to be a largest face among a plurality of faces in the at least one frame of the video data.3. The method according claim 1, further comprising:determining a plurality of sets of facial landmark features, other than the set of facial landmark features of the at least one face from the at least one frame of the video data, respect to each of the plurality of faces in the at least one frame of the video data; andcoding the video data at least partly by the neural network based on the determined set of facial landmark features and the determined plurality of sets of facial landmark features.4. The method according to claim 1,wherein the neural network comprises a deep neural network (DNN).5. An apparatus for video coding, the apparatus comprising:at least one memory configured to store computer program code;at least one processor configured to execute the computer program code to implement:obtaining video data;detecting at least one face from at least one frame of the video data;determining a set of facial landmark features of the at least one face from the at least one frame of the video data;determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;determining a set of EFA features from the EFA; andcoding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmark features, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,wherein the video data comprises an encoded bitstream of the video data,wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, andwherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.6. The apparatus according to claim 5,wherein the at least one face from the at least one frame of the video data is determined to be a largest face among a plurality of faces in the at least one frame of the video data.7. The apparatus according to claim 5,wherein the at least one hardware processor is further configured to execute the computer program code to implement:determining a plurality of sets of facial landmark features, other than the set of facial landmark features of the at least one face from the at least one frame of the video data, respect to each of the plurality of faces in the at least one frame of the video data; andcoding the video data at least partly by the neural network based on the determined set of facial landmark features and the determined plurality of sets of facial landmark features.8. A non-transitory computer readable medium storing a program causing a computer to execute a process, the process comprising:obtaining video data;detecting at least one face from at least one frame of the video data;determining a set of facial landmark features of the at least one face from the at least one frame of the video data;determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;determining a set of EFA features from the EFA; andcoding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmarks, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,wherein the video data comprises an encoded bitstream of the video data,wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, andwherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.
微信群二維碼
意見反饋