In an embodiment, for an input image I, alpha masks and a color sail for each alpha mask can be predicted by the mapping module 122 such that approximately all colors in image I are well represented in the complete color sail rig. In most cases, the alpha masks can be learned without any explicit supervision. In particular, the trained first neural network can be leveraged and an image reconstruction loss can be optimized to learn meaningful alpha masks. Optimizing for image reconstruction, given a constant palette network, can force the second neural network to output masks that correspond to image regions that can be well explained by a distribution of colors in a single color sail.
In this way, the second neural network is trained to automatically split an image into soft regions (alpha masks), where each soft region is associated with its own color sail predicted by the second neural network. Thus, changing a color sail only affects its associated soft region of the image.