In use, the network microphone devices 103 are configured to interact with a voice assistant service VAS, such as a first VAS 160 hosted by one or more of the remote computing devices 106a. For example, as shown in FIG. 1B, the NMD 103f is configured to receive voice input 121 from a user 123. The NMD 103f transmits data associated with the received voice input 121 to the remote computing devices 106a of the VAS 160, which are configured to (i) process the received voice input data and (ii) transmit a corresponding command to the MPS 100. In some aspects, for example, the remote computing devices 106a comprise one or more modules and/or servers of a VAS (e.g., a VAS operated by one or more of SONOS, AMAZON, GOOGLE APPLE, MICROSOFT). The remote computing devices 106a can receive the voice input data from the NMD 103f, for example, via the LAN 111 and the router 109. In response to receiving the voice input data, the remote computing devices 106a process the voice input data (i.e., “Play Hey Jude by The Beatles”), and may determine that the processed voice input includes a command to play a song (e.g., “Hey Jude”). In response, one of the computing devices 106a of the VAS 160 transmits a command to one or more remote computing devices (e.g., remote computing devices 106d) associated with the MPS 100. In this example, the VAS 160 may transmit a command to the MPS 100 to play back “Hey Jude” by the Beatles. As described below, the MPS 100, in turn, can query a plurality of suitable media content services (“MCS(es)”) 167 for media content, such as by sending a request to a first MCS hosted by first one or more remote computing devices 106b and a second MCS hosted by second one or more remote computing devices 106c. In some aspects, for example, the remote computing devices 106b and 106c comprise one or more modules and/or servers of a corresponding MCS (e.g., an MCS operated by one or more of SPOTIFY, PANDORA, AMAZON MUSIC , etc.).