The neural network 1310 is a machine learning model that is trained in order to receive as input the pre-processed acoustic signal 1302 and generate as output a sequence of actuator signals representing the acoustic signal 1302 and also obeying a variety of constraints 1312. The neural network 1310 may be a convolutional neural network. The convolutional neural network processes on multiple slices (or frames) of the spectrogram in order to generate a sequence of actuator signals. These actuator signals are intended to be a representation of the center slice among the multiple slices that are processed. In another embodiment, the neural network 1310 is a recurrent neural network, which has nodes that form a directed cycle. In this fashion, the neural network 1310 may process each slice of the spectrogram 1306 separately, but with each slice influencing the state of the neural network 1310 for the processing of the subsequent slice. The neural network 1310 may use any one of a common set of activation functions, such as the sigmoid, softmax, rectifier, or hyperbolic tangent. The input features are fed into the neural network 1310, which has been initialized with randomized weights. The output of the neural network 1310 indicates a combination of haptic cues for each slice of the input audio. Each of these haptic cues indicates the haptic output for a cutaneous actuator, such as cutaneous actuator 108, and may be represented by one or more nodes in the final layer of the neural network 1310. Each node of the neural network 1310 may indicate for each cutaneous actuator 108, a percentage value that may be used to determine whether or not to activate a particular state of the cutaneous actuator. For example, if the percentage is below 50%, then the cutaneous actuator should not be activated. Multiple nodes may be used if the cutaneous actuator has more than two states.