In some embodiments, each layer 710, . . . , 720 may be configured to apply a graph convolution operation on the set of neighbor nodes based on weights specific for the set of neighbor nodes, to obtain the word representation for the given node. The weights indicate respective contributions of the set of neighbor nodes to the given node. The weights for the set of neighbor nodes at each layer may be the same or different. In an example, the graph convolution module 526 may combine the set of neighbor nodes (i.e., the word embedding of the corresponding words) by means of weighted summation based on the respective weights and perform a multi-perceptron (MLP) operation or a perceptron operation on the result of the combination.
In some embodiments, considering the neighbor nodes with different relationships may have different contributions when aggregating their information, the weights specific for the set of neighbor nodes may be a first set of weights specific to types of the relationships indicated by the edges between the set of neighbor nodes and the given node. As mentioned above, the relationships indicated by the edges in the sentence graph 512 include different types of syntactic relationships, the sequential relationship, and/or the self-relationship. Weights in the first set may be the same for the same type of relationship, but may be varied for different types of relationships. A graph convolution based on the first set of weights may be referred to as a first graph convolution, or may sometimes be referred to as an edge-based graph convolution because the weights depend on the relationships indicated by the edges.
The first graph convolution based on the first set of weights performed at each layer 710, . . . , 720 may be represented as follows: