Referring now to FIG. 7, a methodology 700 executed by a computing device for generating a computer-implemented model is illustrated. The methodology 700 begins at 702, and at 704, the computing device accesses a plurality of documents of a defined type from a data store. Each document in the plurality of documents comprises computer-readable text and a layout that defines positions of the computer-readable text within a two-dimensional area represented by each document. At least some documents in the plurality of documents have characteristics that vary. The characteristics may include varying portions of computer-readable text, varying positions of the computer-readable text within the documents, varying typographical emphasis of portions of the computer-readable text, varying areas of the plurality of documents, varying lengths and widths of the plurality of documents, varying font types of the portions of the computer-readable text, and/or varying font sizes of the portions of the computer-readable text.
At 706, the computing device generates a computer-implemented model based upon the plurality of documents. The computer-implemented model is configured to take, as input, at least some of the characteristics described above from a document of the defined type. The computer-implemented model outputs, based upon the input, a plurality of textual elements in the document and scores assigned to the plurality of textual elements. A score in the scores is indicative of a likelihood that at least one textual element in the plurality of textual elements represents relevant content in the document based upon defined criteria for the defined type. The methodology 700 concludes at 708.