白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Computerized intelligent assistant for conferences

專利號
US10867610B2
公開日期
2020-12-15
申請人
Microsoft Technology Licensing, LLC(US WA Redmond)
發(fā)明人
Adi Diamant; Karen Master Ben-Dor; Eyal Krupka; Raz Halaly; Yoni Smolin; Ilya Gurvich; Aviv Hurvitz; Lijuan Qin; Wei Xiong; Shixiong Zhang; Lingfeng Wu; Xiong Xiao; Ido Leichter; Moshe David; Xuedong Huang; Amit Kumar Agarwal
IPC分類
H04N7/14; G10L15/26; H04N7/15; G06K9/00; G10L17/00
技術(shù)領(lǐng)域
conference,transcript,assistant,or,may,in,speech,machine,e.g,remote
地域: WA WA Redmond

摘要

A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant.

說明書

In some examples, speech recognition machine 130 may be configured to segment speech audio into words (e.g., using LSTM trained to recognize word boundaries, and/or separating words based on silences or amplitude differences between adjacent words). In some examples, speech recognition machine 130 may classify individual words to assess lexical data for each individual word (e.g., character sequences, word sequences, n-grams). In some examples, speech recognition machine 130 may employ dependency and/or constituency parsing to derive a parse tree for lexical data. In some examples, speech recognition machine 130 may operate AI and/or ML models (e.g., LSTM) to translate speech audio and/or vectors representing speech audio in the learned representation space, into lexical data, wherein translating a word in the sequence is based on the speech audio at a current time and further based on an internal state of the AI and/or ML models representing previous words from previous times in the sequence. Translating a word from speech audio to lexical data in this fashion may capture relationships between words that are potentially informative for speech recognition, e.g., recognizing a potentially ambiguous word based on a context of previous words, and/or recognizing a mispronounced word based on a context of previous words. Accordingly, speech recognition machine 130 may be able to robustly recognize speech, even when such speech may include ambiguities, mispronunciations, etc.

權(quán)利要求

1
微信群二維碼
意見反饋