What is claimed is:1. A system comprising:a server apparatus performing a voice assistant function; anda plurality of devices communicatively connected to each other and the server apparatus,wherein,the plurality of devices each records a user's speech through a microphone on each device, and then each transmits recorded data of the user's speech to the server apparatus,the server apparatus receives the recorded data transmitted from each of the plurality of devices, and then voice-recognizes two or more of the received recorded data in accordance with a predetermined standard to thereby interpret a content of the user's speech to perform the voice assistant, andthe server apparatus selects a device outputting the voice assistant among the plurality of devices according to a predetermined priority, the predetermined priority determined based on analyzing different factors for each of the plurality of devices, the factors comprising a state where the device is in use or not in use, a type of an output unit used in the device, a distance between the device and a user, and performance of the output unit of the device, the server apparatus analyzing the different factors in the foregoing order.2. The system according to claim 1, wherein:the plurality of devices starts the recording of the user's speech after a user's predetermined verbal start command is input through the microphone of a device of the plurality of devices.3. The system according to claim 1, wherein:the plurality of devices each further transmits recording state information indicating a recording state in recording the user's speech to the server apparatus.4. The system according to claim 3, wherein:the server apparatus interprets the content of the user's speech while performing weighting of the received recorded data according to the recording state information of the two or more of the received recorded data.5. The system according to claim 4, wherein:the recording state information includes at least one of a recording level, a noise level, and an echo.6. The system according to claim 4, wherein:the recording state information includes all information of a recording level, a noise level, and an echo.7. The system of claim 6, wherein the recording state describes a reliability of the plurality of devices, the reliability comprising a sum of values representing the recording level, the noise level, and the echo.8. The system of claim 1, wherein:the type of output unit used in the device refers to whether a headset or speaker is being used, with priority given to the headset over the speaker;the distance between the device and the user is classified into distance ranges, where shorter distance ranges are given priority over longer distance ranges; andthe performance of the output unit of the device refers to giving priority to a device with an output unit that is loudest or largest.9. An apparatus comprising:a communication module for performing data communication with a plurality of devices through a network;a voice recognition module for voice-recognizing recorded data of a same user's speech received through the communication module and transmitted from each of the plurality of devices in accordance with a predetermined standard to thereby interpret a content of the user's speech; anda voice assistant module for:performing voice assistant according to the content of the user's speech interpreted in the voice recognition module; andselecting a device for outputting the voice assistant among the plurality of devices according to a predetermined priority, the predetermined priority determined based on analyzing different factors for each of the plurality of devices, the factors comprising a state where the device is in use or not in use, a type of an output unit used in the device, a distance between the device and a user, and performance of the output unit of the device, the server apparatus analyzing the different factors in the foregoing order.10. The apparatus according to claim 9, wherein:the voice recognition module interprets the content of the user's speech while performing weighting of a plurality of received recorded data according to recording state information indicating a recording state of each of the plurality of received recorded data transmitted from the plurality of devices.11. The apparatus according to claim 10, wherein:the recording state information includes at least one of a recording level, a noise level, and an echo.12. The apparatus according to claim 10, wherein:the recording state information includes all information of a recording level, a noise level, and an echo.13. The apparatus of claim 12, wherein the recording state describes a reliability of the plurality of devices, the reliability comprising a sum of values representing the recording level, the noise level, and the echo.14. The apparatus of claim 9, wherein:the type of output unit used in the device refers to whether a headset or speaker is being used, with priority given to the headset over the speaker;the distance between the device and the user is classified into distance ranges, where shorter distance ranges are given priority over longer distance ranges; andthe performance of the output unit of the device refers to giving priority to a device with an output unit that is loudest or largest.15. A program product comprising a non-transitory computer-readable storage medium that stores code executable by a processor, the executable code comprising code to perform:a voice recognition process of voice-recognizing recorded data of a same user's speech received through a communication module and transmitted from each of a plurality of devices in accordance with a predetermined standard to thereby interpret a content of the user's speech; anda voice assistant process of:performing voice assistant according to the contents of the user's speech interpreted in the voice recognition process; andselecting a device for outputting the voice assistant among the plurality of devices according to a predetermined priority, the predetermined priority determined based on analyzing different factors for each of the plurality of devices, the factors comprising a state where the device is in use or not in use, a type of an output unit used in the device, a distance between the device and a user, and performance of the output unit of the device, the server apparatus analyzing the different factors in the foregoing order.16. The program product of claim 15, wherein the executable code further comprises code to perform interpreting content of the user's speech while performing weighting of a plurality of received recorded data according to a recording state information indicating a recording state of each of the plurality of received recorded data transmitted from the plurality of devices.17. The program product of claim 16, wherein:the recording state information includes at least one of a recording level, a noise level, and an echo.18. The program product of claim 16, wherein:the recording state information includes all information of a recording level, a noise level, and an echo.19. The program product of claim 18, wherein the recording state describes a reliability of the plurality of devices, the reliability comprising a sum of values representing the recording level, the noise level, and the echo.20. The program product of claim 15, wherein:the type of output unit used in the device refers to whether a headset or speaker is being used, with priority given to the headset over the speaker;the distance between the device and the user is classified into distance ranges, where shorter distance ranges are given priority over longer distance ranges; andthe performance of the output unit of the device refers to giving priority to a device with an output unit that is loudest or largest.