In the example depicted in FIG. 7A, the second voice processor 760b includes a second buffer 768b and does not include an AEC and a spatial processor. Such a configuration may be beneficial, for example, as wake word engines associated with certain VASes, such as GOOGLE's ASSISTANT, may not require acoustic echo cancellation and/or spatial processing for wake word detection. In other embodiments, the second voice processor 760b may not include a buffer and/or may include an AEC, a spatial processor, and/or other voice processing components. In any event, the components of the second voice processor 760b are configured to process and feed detected sound data to the second voice processor 760b via the network interfaces 724 (represented by arrows I(b)-I(d)). The second playback device 702b and/or the second wake word engine 770b may be associated with the second VAS 790b and configured to detect a second wake word specific to the second VAS 790b that is different than the first wake word. For example, the second wake word engine 770b may be associated with GOOGLE's ASSISTANT and configured to run a corresponding wake word detection algorithm (e.g., configured to detect the wake word “OK, Google” or other associated wake word). Thus, in some aspects of the technology, the first and second wake word engines 770a and 770b are configured to detect different wake words associated with different VASes.