In an embodiment, the computing device 100 may be in communication with a scanner (not shown). The scanner may generate the document image 116 by scanning a physical copy of a document.
The data store 114 also stores a document 118. The document 118 comprises computer-readable text (i.e., text that is searchable by the computing device 100) and a layout. The computer-readable text may include combinations of American Standard Code for Information Interchange (ASCII) characters and/or combinations of Unicode characters. For instance, the computer-readable text may include letters, numbers, punctuation, and/or mathematical symbols.
The layout defines positions of the computer-readable text within a two-dimensional area represented by the document 118. Thus, the document 118 has a length and a width. In a non-limiting example, the two-dimensional area may correspond to an A4 paper size, a letter paper size, or a legal paper size.
In an embodiment, the document 118 may be a tabular document such that the computer-readable text is arranged within one or more tables in the document 118. Thus, in the embodiment, the layout of the document 118 may define positions of the computer-readable text within the one or more tables.