Further, common link prediction algorithms for networks roughly fall into two big categories: learning-based, and similarity-based. The learning-based methods treat link prediction as a binary classification problem and train a machine learning model to predict the class label (i.e., positive for potential linking) for each non-connected node pairs. One related art approach is feature-based classification, which extracts features based on node attributes, topological structures, social theories, or combinations of them. Another is based on probabilistic graph models including relational model, entity-relationship model, and so forth. These techniques, although effective, are less general, which often require some extra information (e.g., semantic node attributes) in addition to the observed network structure. However, the trained machine learning models may only perform well on networks with certain characteristics (depending on the training set).
On the other hand, similarity-based methods attempt to compute a similarity score based on every non-connected pair of nodes and rank all these potential links. Ways of computing the similarity metrics include random-walk based simulation, and neighbor-based measures such as common neighbors, jaccard coefficient, adamic-adar coefficient, and preferential attachment. Researchers extended some of the similarity metrics to the bipartite network scenario. Example implementations move one step further by proposing a family of ensemble methods via integrating an important type of structural information in bipartite networks, bicliques, to improve the performance of the prediction.