Next, as illustrated in FIG. 3B, when the actually calculated output vectors y are different from predefined predicted vectors, the distributed representation learning section 11 updates the weights serving as parameters in the order of W′N×V and WV×N based on the differences between the output vectors y and the predefined predicted vectors. The update of the parameters is referred to as back propagation, for example. Then, the distributed representation learning section 11 multiplies the updated weight WV×N by the input vector x to generate a word vector h of the hidden layer. Then, the distributed representation learning section 11 multiplies the updated weight W′N×V by the word vector h to generate output vectors y of the output layer. For example, the distributed representation learning section 11 executes prediction using the updated W′N×V and WV×N. As a result, the distributed representation learning section 11 predicts that the word preceding the given word is “drink” with a probability of 0.1236 and that the word succeeding the given word is “juice” with a probability of 0.1289. These probabilities are slightly higher than the previously predicted probabilities.