We claim:1. A method, comprising:capturing voice input via a network microphone device (NMD) of a media playback system, the media playback system comprising one or more local network devices, including the network microphone device, within a physical environment and one or more first remote computing devices, wherein the voice input comprises a request for media content;transmitting the voice input from the NMD to one or more second remote computing devices associated with a voice assistant service for deriving intent information regarding the request for media content based at least on the voice input;receiving, at the media playback system, a response from the one or more second remote computing devices associated with the voice assistant service, wherein the response comprises the derived intent information;based at least in part on the derived intent information, requesting, via the media playback system and independent of the voice assistant service, media content information from a plurality of media content services, wherein the requesting comprises requesting the media content information from (i) at least one third remote computing device associated with a first media content service and (ii) at least one fourth remote computing device associated with a second media content service;receiving, at the media playback system and independent of the voice assistant service, first information from the at least one third remote computing device and second information from the at least one fourth remote computing device, wherein the first information identifies first media content available via the first media content service for playback and the second information identifies second media content available via the second media content service for playback; andafter receiving at least one of the first information and the second information, (i) selecting, via the media playback system, the first media content and forgoing selection of the second media content, (ii) transmitting a uniform resource identifier (URI) or uniform resource locator (URL) associated with the first media content from the one or more first remote computing devices of the media playback system to the NMD, and (iii) requesting, via the NMD, the first media content, via the URI or URL, from the at least one third remote comprising device of the first media content service for playback, and (iv) playing back the first media content via the NMD.2. The method of claim 1, further comprising:transmitting, via the media playback system, a request for a voice response to the one or more second computing devices of the voice assistant service, wherein the request for the voice response is based at least on one of the first information and the second information; andreceiving and playing back, via the media playback system, the voice response.3. The method of claim 2, wherein the voice response is at least one of (a) a request for additional information regarding the request for media content, and (b) an acknowledgement of receipt of the request for media content.4. The method of claim 2, wherein the voice response identifies the first media content available via the first media content service, the first media content service, the second media content available via the second media content service, and the second media content service.5. The method of claim 1, further comprising, (i) after receiving the selection initiating the playback of the first media content, and (ii) after initiating the playback of the first media content, transmitting a request for a voice response to the one or more second remote computing devices of the voice assistant service.6. The method of claim 1, wherein the response received from the one or more second remote computing devices associated with the voice assistant service includes a message comprising a plurality of predetermined fields, wherein at least one of the predetermined fields is populated by the voice assistant service with at least a portion of the derived intent information.7. The method of claim 1, wherein the derived intent information comprises a predefined data structure including one or more media content attributes, and wherein requesting media content information from the plurality of media content services comprises querying the media content services for media corresponding to the media content attributes.8. A media playback system, comprising:one or more processors;at least one network microphone device (NMD) comprising at least one microphone;one or more first remote computing devices; andtangible, non-transitory, computer-readable media storing instructions executable by one or more processors to cause the media playback system to perform operations comprising:capturing voice input via the NMD, wherein the voice input comprises a request for media content;transmitting the voice input to one or more second remote computing devices associated with a voice assistant service for deriving intent information regarding the request for media content based at least on the voice input;receiving a response from the one or more second remote computing devices, wherein the response comprises the derived intent information;based at least in part on the derived intent information, requesting, independent of the voice assistant service, media content information from a plurality of media content services, wherein the requesting comprises requesting the media content information from (i) at least one third remote computing device associated with a first media content service and (ii) at a least one fourth remote computing device associated with a second media content service;receiving, independent of the voice assistant service first information from the at least one third remote computing device and second information from the at least one fourth remote computing device, wherein the first information identifies first media content available via the first media content service for playback and the second information identifies second media content available via the second media content service for playback; andafter receiving at least one of the first information and the second information, (i) selecting, via the media playback system the first media content and forgoing selection of the second media content, (ii) transmitting a uniform resource identifier (URI) or uniform resource locator (URL) associated with the first media content from the one or more first remote computing devices of the media playback system to the NMD, and (iii) requesting, via the NMD, the first media content, via the URI or URL, from the at least one third remote computing device of the first media content service for playback, and (iv) playing back the first media content via the NMD.9. The media playback system of claim 8, the operations further comprising:transmitting, via the media playback system, a request for a voice response to the one or more second computing devices of the voice assistant service, wherein the request for the voice response is based at least on one of the first information and the second information; andreceiving and playing back, via the media playback system, the voice response.10. The media playback system of claim 9, wherein the voice response is at least one of (a) a request for additional information regarding the request for media content, and (b) an acknowledgement of receipt of the request for media content.11. The media playback system of claim 9, wherein the voice response identifies the first media content available via the first media content service, the first media content service, the second media content available via the second media content service, and the second media content service.12. The media playback system of claim 8, the operations further comprising, (i) after receiving the selection initiating the playback of the first media content, and (ii) after initiating the playback of the first media content, transmitting a request for a voice response to the one or more second remote computing devices of the voice assistant service.13. The media playback system of claim 8, wherein the derived intent information comprises a predefined data structure including one or more media content attributes, and wherein requesting media content information from the plurality of media content services comprises querying the media content services for media corresponding to the media content attributes.14. Tangible, non-transitory, computer-readable media storing instructions executable by one or more processors to cause a media playback system to perform operations comprising:capturing voice input via a network microphone device (NMD) of a media playback system, the media playback system comprising one or more local network devices, including the network microphone device, within a physical environment and one or more first remote computing devices, wherein the voice input comprises a request for media content;transmitting the voice input from the media playback system to one or more second remote computing devices associated with a voice assistant service for deriving intent information regarding the request for media content based at least on the voice input;receiving, at the media playback system, a response from the one or more second remote computing devices associated with the voice assistant service, wherein the response comprises the derived intent information;based at least in part on the derived intent information, requesting, independent of the voice assistant service, media content information from a plurality of media content services, wherein the requesting comprises requesting the media content information from (i) at least one third remote computing device associated with a first media content service and (ii) at a least one fourth remote computing device associated with a second media content service;receiving, at the media playback system and independent of the voice assistant service, first information from the at least one third remote computing device and second information from the at least one fourth remote computing device, wherein the first information identifies first media content available via the first media content service for playback and the second information identifies second media content available via the second media content service for playback; andafter receiving at least one of the first information and the second information, (i) selecting, via the media playback system, the first media content and forgoing selection of the second media content, (ii) transmitting a uniform resource identifier (URI) or uniform resource locator (URL) associated with the first media content from the one or more first remote computing devices of the media playback system to the NMD, and (iii) requesting, wia the NMD, the first media content, wia the URI or URL, from the at least one third remote computing device of the first media content service for the playback, and (iv) playing back the first media content via the NMD.15. The tangible, non-transitory, computer-readable media of claim 14, the operations further comprising:transmitting, via the media playback system, a request for a voice response to the one or more second computing devices of the voice assistant service, wherein the request for the voice response is based at least on one of the first information and the second information; andreceiving and playing back, via the NMD, the voice response.16. The tangible, non-transitory, computer-readable media of claim 15, wherein the voice response is at least one of (a) a request for additional information regarding the request for media content, and (b) an acknowledgement of receipt of the request for media content.17. The tangible, non-transitory, computer-readable media of claim 15, wherein the voice response identifies the first media content available via the first media content service, the first media content service, the second media content available via the second media content service, and the second media content service.18. The tangible, non-transitory, computer-readable media of claim 14, the operations further comprising, (i) after receiving the selection initiating the playback of the first media content, and (ii) after initiating the playback of the first media content, transmitting a request for a voice response to the one or more second remote computing devices of the voice assistant service.19. The tangible, non-transitory, computer-readable media of claim 14, the operations further comprising, wherein the derived intent information comprises a predefined data structure including one or more media content attributes, and wherein requesting media content information from the plurality of media content services comprises querying the media content services for media corresponding to the media content attributes.