白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Voice assistant system, server apparatus, device, voice assistant method therefor, and program to be executed by computer

專利號(hào)
US10867596B2
公開日期
2020-12-15
申請(qǐng)人
LENOVO (Singapore) PTE. LTD.(SG New Tech Park)
發(fā)明人
Masaharu Yoneda; Kazuhiro Kosugi; Koji Kawakita
IPC分類
G10L15/00; G10L15/07; G10L15/08; G10L21/0208; G10L15/18; G10L15/32; G06F3/16; G10L15/22; G10L13/02; G10L15/30
技術(shù)領(lǐng)域
voice,assistant,speech,module,user's,server,recording,in,or,devices
地域: New Tech Park

摘要

A voice assistant system includes a server apparatus performing voice assistant and a plurality of devices, in which the server apparatus and the devices are communicatively connected to each other. The plurality of devices each records the same user's speech through a microphone, and then transmits recorded data of the same user's speech to the server apparatus. The server apparatus receives the recorded data transmitted from each of the plurality of devices, and then voice-recognizes two or more of the received recorded data in accordance with a predetermined standard to thereby interpret the contents of the user's speech to perform the voice assistant.

說(shuō)明書

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims priority to Japan Patent Application No. JP2017-154571, filed on 9 Aug. 2017 for Masaharu Yoneda, et al., the entire contents of which are incorporated herein by reference for all purposes.

FIELD

The present disclosure relates to a voice assistant system, a server apparatus, a device, a voice assistant method, and a program to be executed by a computer.

BACKGROUND

In recent times, voice assistant has been mounted on various kinds of devices. Such a voice assistant is a function of interpreting a user's speech to execute the response to various kinds of questions or operations instructed by voice. The voice assistant generally interprets the contents of speech uttered by a user by utilizing techniques, such as voice recognition or natural language processing.

In general, when a user speaks a predetermined keyword (verbal start command), a voice assistant function is started, which enables the user to use the voice assistant. At present, a unique voice assistant is mounted on each device, and therefore a user needs to properly use a keyword for each device which the user asks, which is inconvenient. For example, the keywords are “Hey Cortana” in the case of Windows (Registered Trademark) machines, “Hey Siri” in the case of iPhone (Registered Trademark) terminals, “OK Google” in the case of Android terminals, and the like.

權(quán)利要求

1
What is claimed is:1. A system comprising:a server apparatus performing a voice assistant function; anda plurality of devices communicatively connected to each other and the server apparatus,wherein,the plurality of devices each records a user's speech through a microphone on each device, and then each transmits recorded data of the user's speech to the server apparatus,the server apparatus receives the recorded data transmitted from each of the plurality of devices, and then voice-recognizes two or more of the received recorded data in accordance with a predetermined standard to thereby interpret a content of the user's speech to perform the voice assistant, andthe server apparatus selects a device outputting the voice assistant among the plurality of devices according to a predetermined priority, the predetermined priority determined based on analyzing different factors for each of the plurality of devices, the factors comprising a state where the device is in use or not in use, a type of an output unit used in the device, a distance between the device and a user, and performance of the output unit of the device, the server apparatus analyzing the different factors in the foregoing order.2. The system according to claim 1, wherein:the plurality of devices starts the recording of the user's speech after a user's predetermined verbal start command is input through the microphone of a device of the plurality of devices.3. The system according to claim 1, wherein:the plurality of devices each further transmits recording state information indicating a recording state in recording the user's speech to the server apparatus.4. The system according to claim 3, wherein:the server apparatus interprets the content of the user's speech while performing weighting of the received recorded data according to the recording state information of the two or more of the received recorded data.5. The system according to claim 4, wherein:the recording state information includes at least one of a recording level, a noise level, and an echo.6. The system according to claim 4, wherein:the recording state information includes all information of a recording level, a noise level, and an echo.7. The system of claim 6, wherein the recording state describes a reliability of the plurality of devices, the reliability comprising a sum of values representing the recording level, the noise level, and the echo.8. The system of claim 1, wherein:the type of output unit used in the device refers to whether a headset or speaker is being used, with priority given to the headset over the speaker;the distance between the device and the user is classified into distance ranges, where shorter distance ranges are given priority over longer distance ranges; andthe performance of the output unit of the device refers to giving priority to a device with an output unit that is loudest or largest.9. An apparatus comprising:a communication module for performing data communication with a plurality of devices through a network;a voice recognition module for voice-recognizing recorded data of a same user's speech received through the communication module and transmitted from each of the plurality of devices in accordance with a predetermined standard to thereby interpret a content of the user's speech; anda voice assistant module for:performing voice assistant according to the content of the user's speech interpreted in the voice recognition module; andselecting a device for outputting the voice assistant among the plurality of devices according to a predetermined priority, the predetermined priority determined based on analyzing different factors for each of the plurality of devices, the factors comprising a state where the device is in use or not in use, a type of an output unit used in the device, a distance between the device and a user, and performance of the output unit of the device, the server apparatus analyzing the different factors in the foregoing order.10. The apparatus according to claim 9, wherein:the voice recognition module interprets the content of the user's speech while performing weighting of a plurality of received recorded data according to recording state information indicating a recording state of each of the plurality of received recorded data transmitted from the plurality of devices.11. The apparatus according to claim 10, wherein:the recording state information includes at least one of a recording level, a noise level, and an echo.12. The apparatus according to claim 10, wherein:the recording state information includes all information of a recording level, a noise level, and an echo.13. The apparatus of claim 12, wherein the recording state describes a reliability of the plurality of devices, the reliability comprising a sum of values representing the recording level, the noise level, and the echo.14. The apparatus of claim 9, wherein:the type of output unit used in the device refers to whether a headset or speaker is being used, with priority given to the headset over the speaker;the distance between the device and the user is classified into distance ranges, where shorter distance ranges are given priority over longer distance ranges; andthe performance of the output unit of the device refers to giving priority to a device with an output unit that is loudest or largest.15. A program product comprising a non-transitory computer-readable storage medium that stores code executable by a processor, the executable code comprising code to perform:a voice recognition process of voice-recognizing recorded data of a same user's speech received through a communication module and transmitted from each of a plurality of devices in accordance with a predetermined standard to thereby interpret a content of the user's speech; anda voice assistant process of:performing voice assistant according to the contents of the user's speech interpreted in the voice recognition process; andselecting a device for outputting the voice assistant among the plurality of devices according to a predetermined priority, the predetermined priority determined based on analyzing different factors for each of the plurality of devices, the factors comprising a state where the device is in use or not in use, a type of an output unit used in the device, a distance between the device and a user, and performance of the output unit of the device, the server apparatus analyzing the different factors in the foregoing order.16. The program product of claim 15, wherein the executable code further comprises code to perform interpreting content of the user's speech while performing weighting of a plurality of received recorded data according to a recording state information indicating a recording state of each of the plurality of received recorded data transmitted from the plurality of devices.17. The program product of claim 16, wherein:the recording state information includes at least one of a recording level, a noise level, and an echo.18. The program product of claim 16, wherein:the recording state information includes all information of a recording level, a noise level, and an echo.19. The program product of claim 18, wherein the recording state describes a reliability of the plurality of devices, the reliability comprising a sum of values representing the recording level, the noise level, and the echo.20. The program product of claim 15, wherein:the type of output unit used in the device refers to whether a headset or speaker is being used, with priority given to the headset over the speaker;the distance between the device and the user is classified into distance ranges, where shorter distance ranges are given priority over longer distance ranges; andthe performance of the output unit of the device refers to giving priority to a device with an output unit that is loudest or largest.
微信群二維碼
意見反饋