In operation, a computing device that executes the textual extraction application receives a document comprising computer-readable text and a layout. The computer-readable text may include letters, numbers, punctuation, and/or mathematical symbols. The layout defines positions of the computer-readable text within a two-dimensional area represented by the document. The document may have a defined type, wherein the defined type is indicative of a purpose of the document. In an example, a defined type of a document may be an educational transcript, and as such, computer-readable text of the educational transcript may be indicative of classes taken by a student, credit hours received by the students for the classes, and grades that the student received in the classes. In a further example, portions of the computer-readable text and/or the layout of the document may not have been encountered previously by the textual extraction application.
Responsive to receiving the document, the textual extraction application identifies at least one textual element in the computer-readable text based upon spatial factors between portions of the computer-readable text in the document and contextual relationships between the portions of the computer-readable text. The spatial factors may include distances between the portions of the computer-readable text, angles between the portions of the computer-readable text and an axis of the document, and/or orderings between the portions of the computer-readable text. The textual extract application may calculate the spatial factors based upon the positions of the computer-readable text within the document. The contextual relationships are determined via at least one computer-implemented model. Exemplary contextual relationships include source to object, object to use, person to location, whole to part, and/or type to subtype.