白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Computing system for extraction of textual elements from a document

專利號
US11176364B2
公開日期
2021-11-16
申請人
Hyland Software, Inc.(US OH Westlake)
發(fā)明人
Ralph Meier; Thorsten Wanschura; Johannes Hausmann; Harry Urbschat
IPC分類
G06K9/00; G06K9/20; G06T7/70; G06K9/72; G06T7/50; G06K9/62
技術(shù)領(lǐng)域
textual,document,text,computer,readable,in,extraction,element,computing,documents
地域: OH OH Westlake

摘要

Described herein are various technologies pertaining to text extraction from a document. A computing device receives the document. The document comprises computer-readable text and a layout, wherein the layout defines positions of the computer-readable text within a two-dimensional area represented by the document. Responsive to receiving the document, the computing device identifies at least one textual element in the computer-readable text based upon spatial factors between portions of the computer-readable text and contextual relationships between the portions of the computer-readable text. The computing device then outputs the at least one textual element.

說明書

Referring now to FIG. 7, a methodology 700 executed by a computing device for generating a computer-implemented model is illustrated. The methodology 700 begins at 702, and at 704, the computing device accesses a plurality of documents of a defined type from a data store. Each document in the plurality of documents comprises computer-readable text and a layout that defines positions of the computer-readable text within a two-dimensional area represented by each document. At least some documents in the plurality of documents have characteristics that vary. The characteristics may include varying portions of computer-readable text, varying positions of the computer-readable text within the documents, varying typographical emphasis of portions of the computer-readable text, varying areas of the plurality of documents, varying lengths and widths of the plurality of documents, varying font types of the portions of the computer-readable text, and/or varying font sizes of the portions of the computer-readable text.

At 706, the computing device generates a computer-implemented model based upon the plurality of documents. The computer-implemented model is configured to take, as input, at least some of the characteristics described above from a document of the defined type. The computer-implemented model outputs, based upon the input, a plurality of textual elements in the document and scores assigned to the plurality of textual elements. A score in the scores is indicative of a likelihood that at least one textual element in the plurality of textual elements represents relevant content in the document based upon defined criteria for the defined type. The methodology 700 concludes at 708.

權(quán)利要求

1
微信群二維碼
意見反饋