白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Computing system for extraction of textual elements from a document

專利號(hào)
US11176364B2
公開日期
2021-11-16
申請(qǐng)人
Hyland Software, Inc.(US OH Westlake)
發(fā)明人
Ralph Meier; Thorsten Wanschura; Johannes Hausmann; Harry Urbschat
IPC分類
G06K9/00; G06K9/20; G06T7/70; G06K9/72; G06T7/50; G06K9/62
技術(shù)領(lǐng)域
textual,document,text,computer,readable,in,extraction,element,computing,documents
地域: OH OH Westlake

摘要

Described herein are various technologies pertaining to text extraction from a document. A computing device receives the document. The document comprises computer-readable text and a layout, wherein the layout defines positions of the computer-readable text within a two-dimensional area represented by the document. Responsive to receiving the document, the computing device identifies at least one textual element in the computer-readable text based upon spatial factors between portions of the computer-readable text and contextual relationships between the portions of the computer-readable text. The computing device then outputs the at least one textual element.

說明書

In an embodiment, the textual extraction application 106 may calculate string metrics for portions of the computer-readable text in the document 118. For instance, the string metrics may include Levenshtein distance, Damerau-Levenshtein distance, longest common subsequence (LCS) distance, Hamming distance, and/or Jaro distance. The textual extraction application 106 may further identify the at least one textual element based upon the string metrics.

Responsive to identifying the at least one textual element, the textual extraction application 106 outputs the at least one textual element. In an example, the textual extraction application 106 may output the at least one textual element by presenting the at least one textual element as part of the graphical features 110 presented on the display 108 of the computing device 100. In another example, the textual extraction application 106 may output the at least one textual element by storing the at least one textual element in a data structure that is conducive to further data processing. For instance, the textual extraction application 106 may cause the at least one textual element to be stored in an eXtensible Markup Language (XML) file (e.g., an XML-based spreadsheet), in a comma separated value (CSV) file, or as an entry in a database. The textual extraction application 106 may store the at least one textual element from the document 118 as part of the extracted textual elements 122 stored in the data store 114.

權(quán)利要求

1
微信群二維碼
意見反饋