In the output layer, V-th dimensional output vectors yc are generated for a number C of panels that are not illustrated. C is the number of predetermined panels, and yc indicates the output vectors corresponding to words preceding and succeeding the given word. W′N×V is a weight between the hidden layer and the output layer and is expressed by a matrix of N×V. As initial states of elements of WN×V, random values are given, for example.
As illustrated in FIG. 3A, the distributed representation learning section 11 uses the Skip-gram model in the neural network composed of the input layer, the hidden layer, and the output layer to learn a distributed representation of the given word. For example, it is assumed that the input vector x is a one-hot vector indicating that an element corresponding to the given word “apple” included in the reference language learning corpus 21 is 1 and that other elements are 0. When the distributed representation learning section 11 receives the input vector x corresponding to the given word “apple”, the distributed representation learning section 11 multiplies the weight WV×N by the input vector x to generate a word vector h of the hidden layer. Then, the distributed representation learning section 11 multiplies the weight W′N×V by the word vector h to generate output vectors y of the output layer. For example, the distributed representation learning section 11 executes prediction using WV×N in the initial state. As a result, the distributed representation learning section 11 predicts that a word preceding the given word is “drink” with a probability of 0.1230 and that a word succeeding the given word is “juice” with a probability of 0.1277.