The third neural network can also be trained in an unsupervised fashion using, for example, adversarial loss. For each unlabeled image, simulated user input can be generated by sampling a point v in the image such that it is not near any edge (for example, an edge map could be obtained using any number of existing algorithms). The third neural network can then be trained using gradient descent on batches of this data to minimize a loss comprised of three terms, EL2, EA, EGAN. The first term EL2 is pixel-wise Least Square Errors (L2) loss between the original image I and the image reconstructed by substituting colors of pixels in the mask with best matching colors in the predicted color sail (for example, similar to Eq. 9). The second term EA is the area loss. Starting from the sampled point p in a training image, a flood fill algorithm could be run to define a region R of similar colors around the point. Because color sails can definitely model a single color well, R can serve as an approximate lower bound for the area of the predicted mask. R can be represented as a one channel image with everything in R marked as 1. This allows us to formulate the first, area-based, unsupervised loss component: EA=Σx,yR(x, y)(R(x, y)?M(x, y)); where M is the predicted mask. Thus, all pixels that are in R, but not in M are penalized. However, if M extends beyond R (which is a lower bound), then no error is incurred.