In yet another case, the Least Square Errors (L2) loss comprises determining a pixel-wise comparison between the input image and a reconstructed image determined by substituting colors of pixels in the first alpha mask with best matching colors in the predicted color representation.
In yet another case, the area loss comprises performing: starting from a sampled point in a training image, using a flood fill algorithm to define a region of similar colors around the point; and determining the area loss as a sum of each region multiplied by the difference between the region and a corresponding predicted mask.
In yet another case, the adversarial loss comprises training a discriminator network and evaluating it on the output of the third neural network.
In yet another case, training the discriminator network comprises: generating a training mask and a predicted color representation; generating edited images by randomly recoloring the input image by perturbing one or several of the predicted color representation base colors; and training the discriminator network to distinguish the edited images from the input image.
In yet another case, the adversarial loss is regularized using a weighted addition of the Least Square Errors (L2) loss and the area loss.
These and other embodiments are contemplated and described herein. It will be appreciated that the foregoing summary sets out representative aspects of systems and methods to assist skilled readers in understanding the following detailed description.