With reference to the first and second aspects, in a first possible implementation of the first and second aspects of the present invention the at least one prediction model is trained using a loss score, where computing the loss score comprises: computing the target set of labels by applying the at least one set operator to the plurality of input sets of labels; computing the model set of labels by providing the model group of features to at least one classification model; and computing a difference between the target set of labels and the model set of labels. Computing a difference between the target set of labels and the model set of labels and using the difference in a loss score used to train the one or more prediction models forces the one or more prediction models to learn to synthesize the new group of features corresponding to the target set of labels only by observing the two or more groups of features, without being explicitly provided with the respective labels of the two or more groups of features, and thus increases accuracy of an output of the one or more prediction models when the input digital images comprise one or more unknown features, not explicitly labeled.