In various embodiments, the probability for each value of the set of motion data values is selected via a user input through an overlay management interface configured to suppress unselected values of the state space in order to create a selected state space with probabilities for motion combinations of the plurality of motion patters based on matching the selected state space with a set of dance motion values. In some such embodiments, the computer model and the plurality of animation elements are generated by processing an image using an overlay template to generate the computer model and the plurality of animation elements. Some such embodiments then operate by generating, on a display of a user device, an output image using the image, the computer model, and the plurality of animation elements, processing audio inputs to identify a set of audio characteristics for audio data received at a microphone of the user device, and animating the output image using the skinned model and the set of motion data values including the probability for each value of the set of motion data values.
Similarly, some embodiments further involve configurations where identifying the plurality of motion patterns comprises receiving, via a user interface input of an overlay management interface, a user selection of the plurality of motion patterns selected from a set of system motion patterns, wherein a subset of the set of system motion patterns is selected for each animation element of the plurality of animation elements, and such embodiments can also operate where identifying the plurality of speed harmonics comprises selecting a speed harmonic for each user selection of the plurality of motion patterns, such that the state space description of the plurality of motion patterns comprises the selected combinations of motion patterns and speed harmonics for each of the plurality of animation elements.