As described above, according to this embodiment, the plurality of devices 20 each record the same user's speech through the microphone 22, and then transmits recorded data of the same user's speech to the server apparatus 10. Then, the server apparatus 10 voice-recognizes the recorded data transmitted from each of the plurality of devices 20 in accordance with a predetermined standard to thereby interpret the contents of the user's speech to perform the voice assistant. This enables a user to easily utilize the voice assistant without hesitation in determining which device callable of utilizing the voice assistant is used even when there are two or more of the devices usable for the voice assistant. Moreover, the voice recognition for the same user's speech is performed using the recorded data of the plurality of devices, and therefore the accuracy of the voice recognition can be increased.
Moreover, according to this embodiment, the plurality of devices 20 may start the recording of the user's speech after the user's predetermined verbal start command is input through the microphone 22. This makes it possible for a user to utilize the voice assistant using the same keyword (verbal start command) and eliminates the necessity of memorizing different keywords for the devices, which is convenient.