When voice processing is allowed to proceed, each of the first and second VASes 790a and 790b may send a response to the corresponding first and second playback devices 702a and 702b, which may include instructions to perform an action or to do nothing. The responses from the first and second VASes 790a and 790b may be transmitted at the same time or at different times, and may or may not be in the same order as the corresponding wake word detection. Likewise, performance of the action (if applicable) by the corresponding playback device may occur at the same time or at different times, and may or may not be in the same order as the corresponding wake word detection and/or receipt of response.
Whether performance of the actions by the first and second playback devices 702a, 702b occurs at least partially at the same time may depend on the nature of the actions to be performed. For example, in the illustrated embodiment, the action for the first playback device 702a is to output the requested media content, while the action for the second playback device 702b is to cause the smart lights to turn on. Turning on the lights does not require output of audio content by the second playback device 802b, and thus the second playback device 702b may perform the action without interfering with the output of the media content by the first playback device 702a. However, if the action does require playback of audio content (for example, the second playback device 702b may output a voice response of “okay” to acknowledge that the voice input has been processed), the first and second playback devices 702a, 702b may coordinate output of their respective audio contents.