FIG. 9 illustrates an example of a model architecture according to system 100. In this example, a second neural network comprises a U-Net architecture to produce alpha masks, which are used to gate histograms that are used to produce a color sail per alpha mask. As described herein, given an image I, a plurality of alpha masks are produced (“Na” number of alpha masks). For each alpha mask, the corresponding colors are encoded into a histogram and output a single color sail. A machine learning model is trained by the mapping module 122 to be able to predict alpha masks such that the color distribution in the region under the mask can be explained using a single color sail. In an embodiment, while the mapping module 122 runs the machine learning model end-to-end, palette prediction can be trained separately, at first, with a first neural network (referred to as a palette network). This can be beneficial because: (1) color sail fitting is an independent problem, a solution to which may be useful outside of the context of image segmentation, (2) a pre-trained palette graph allows the second neural network (referred to as an alpha network) to focus on learning segmentation without conflating its search direction with a separate task.
In this case, the U-net architecture is used; however, it is contemplated that any suitable architecture may be used. The U-net architecture is an encoder-decoder convolutional neural network architecture that has additional connections between encoder and decoder layers of the same resolution. This allows the decoder to take in high-frequency details from the encoder, generally resulting in crisper final output than the traditional encoder-decoder architecture that loses high-frequency detail during the encoding process and lacks the information to recover this detail during decoding.