The data store 114 may further store a computer-implemented model 120 that may be executed by the textual extraction application 106 in order to extract relevant textual elements from the document 118. In an embodiment, the computer-implemented model 120 may be a computer-implemented machine learning model. The computer-implemented model 120 is generated based upon a plurality of documents having a defined type, wherein characteristics (described in greater detail below) of at least some documents in the plurality of documents vary. In general, the computer-implemented model 120 is configured to take, as input, computer-readable text from a document (e.g., the document 118) having a defined type and positions of the computer-readable text within the document. The computer-implemented model 120 is configured to output, based upon the input, a plurality of textual elements from the computer-readable text and a score that is assigned to each textual element in the plurality of textual elements. Each score in the scores is indicative of a likelihood that each textual element in the plurality of textual elements represents relevant content in the document based upon defined criteria (described in greater detail below) for the defined type of the document.
In an embodiment, the computer-implemented model 120 may be or include a predictive model. The predictive model may be or include a continuous bag-of words model, a skip-gram model, or a weighed n-gram differences model. In another embodiment, the computer-implemented model 120 may be or include a count-based model, such as a Latent Semantic Analysis (LSA) model. In an embodiment, the computer-implemented model 120 may incorporate t-distributed stochastic neighbor embedding (t-SNE) techniques.