When the number C of output vectors are different from predefined predicted vectors, the Skip-gram model updates the weights serving as parameters in the order of the weight W′N×V between the hidden layer and the output layer and the weight WV×N between the input layer and the hidden layer in order to learn differences between the vectors. The parameters are updated by, for example, back propagation.
A word vector h, obtained by repeatedly executing learning, of the hidden layer is a distributed representation of the given word (input vector x).
A technique for learning distributed representations of words in two different tasks and using the learned distributed representations of the words to learn vector mapping between the tasks is known (refer to, for example, Madhyastha, Pranava Swaroop, et al. “Mapping Unseen Words to Task-Trained Embedding Spaces”). In this technique, to produce a distributed representation of an unknown word in a certain task, a distributed representation learned in another task is mapped via an objective function.