free性欧美人与doog,成人色爱

摘要

A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant.

說明書

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102

In some examples, speech recognition machine 130 may be configured to segment speech audio into words (e.g., using LSTM trained to recognize word boundaries, and/or separating words based on silences or amplitude differences between adjacent words). In some examples, speech recognition machine 130 may classify individual words to assess lexical data for each individual word (e.g., character sequences, word sequences, n-grams). In some examples, speech recognition machine 130 may employ dependency and/or constituency parsing to derive a parse tree for lexical data. In some examples, speech recognition machine 130 may operate AI and/or ML models (e.g., LSTM) to translate speech audio and/or vectors representing speech audio in the learned representation space, into lexical data, wherein translating a word in the sequence is based on the speech audio at a current time and further based on an internal state of the AI and/or ML models representing previous words from previous times in the sequence. Translating a word from speech audio to lexical data in this fashion may capture relationships between words that are potentially informative for speech recognition, e.g., recognizing a potentially ambiguous word based on a context of previous words, and/or recognizing a mispronounced word based on a context of previous words. Accordingly, speech recognition machine 130 may be able to robustly recognize speech, even when such speech may include ambiguities, mispronunciations, etc.

權(quán)利要求

1

白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Computerized intelligent assistant for conferences

摘要

說明書

權(quán)利要求

白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Computerized intelligent assistant for conferences

摘要

說明書

權(quán)利要求

該功能需要專業(yè)版企業(yè)版VIP權(quán)限，您可以：

該功能需要專業(yè)版企業(yè)版VIP權(quán)限，您可以：