白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Computing system for extraction of textual elements from a document

專利號
US11176364B2
公開日期
2021-11-16
申請人
Hyland Software, Inc.(US OH Westlake)
發(fā)明人
Ralph Meier; Thorsten Wanschura; Johannes Hausmann; Harry Urbschat
IPC分類
G06K9/00; G06K9/20; G06T7/70; G06K9/72; G06T7/50; G06K9/62
技術(shù)領域
textual,document,text,computer,readable,in,extraction,element,computing,documents
地域: OH OH Westlake

摘要

Described herein are various technologies pertaining to text extraction from a document. A computing device receives the document. The document comprises computer-readable text and a layout, wherein the layout defines positions of the computer-readable text within a two-dimensional area represented by the document. Responsive to receiving the document, the computing device identifies at least one textual element in the computer-readable text based upon spatial factors between portions of the computer-readable text and contextual relationships between the portions of the computer-readable text. The computing device then outputs the at least one textual element.

說明書

The data store 114 may further store a computer-implemented model 120 that may be executed by the textual extraction application 106 in order to extract relevant textual elements from the document 118. In an embodiment, the computer-implemented model 120 may be a computer-implemented machine learning model. The computer-implemented model 120 is generated based upon a plurality of documents having a defined type, wherein characteristics (described in greater detail below) of at least some documents in the plurality of documents vary. In general, the computer-implemented model 120 is configured to take, as input, computer-readable text from a document (e.g., the document 118) having a defined type and positions of the computer-readable text within the document. The computer-implemented model 120 is configured to output, based upon the input, a plurality of textual elements from the computer-readable text and a score that is assigned to each textual element in the plurality of textual elements. Each score in the scores is indicative of a likelihood that each textual element in the plurality of textual elements represents relevant content in the document based upon defined criteria (described in greater detail below) for the defined type of the document.

In an embodiment, the computer-implemented model 120 may be or include a predictive model. The predictive model may be or include a continuous bag-of words model, a skip-gram model, or a weighed n-gram differences model. In another embodiment, the computer-implemented model 120 may be or include a count-based model, such as a Latent Semantic Analysis (LSA) model. In an embodiment, the computer-implemented model 120 may incorporate t-distributed stochastic neighbor embedding (t-SNE) techniques.

權(quán)利要求

1
微信群二維碼
意見反饋