白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Handwriting detector, extractor, and language classifier

專利號
US11176361B2
公開日期
2021-11-16
申請人
Raytheon Company(US MA Waltham)
發(fā)明人
Darrell L. Young; Kevin C. Holley
IPC分類
G06F40/171; G06F40/263; G06K9/00; G06K9/34; G06K9/38; G06K9/62; G06K9/68; G06K9/72
技術(shù)領(lǐng)域
language,may,or,in,bounding,be,hardware,features,geometric,image
地域: MA MA Waltham

摘要

Disclosed are methods for handwriting recognition. In some aspects, an image representing a page of a sample document is analyzed to identify a region having indications of handwriting. The region is analyzed to determine frequencies of a plurality of geometric features within the region. The frequencies may be compared to profiles or histograms of known language types, to determine if there are similarities between the frequencies in the sample document relative to those of the known language types. In some aspects, machine learning may be used to characterize the document as a particular language type based on the frequencies of the geometric features.

說明書

FIG. 6 illustrates detected regions of images and handwriting in a document 600 according to various embodiments. The document 600 is a page from a travel article about Washington D.C. Various embodiments use a trained deep CNN to recognize handwriting, machine printed text, and images. In an embodiment, the deep CNN is trained using a training set of printed text, handwriting, and images. The boxes 602a-d in the document 600 show handwriting detected by the deep CNN. The boxes 604a-f are detected text. The green boxes 606 are detected images. In an example, the document 600 may be represented as an image. The image may be provided as input to the deep CNN. The deep CNN may then detect images inside the overall document image. Similar to the line cut problem, image 610 may be falsely detected a handwriting. In some embodiments, a configuration option is provided that selects whether to continue to process candidate handwriting on top of detected images. For example, via the configuration option, an embodiment may be configured to ignore or alternatively to process handwriting within an image.

One challenge associated with machine learning is training set preparation. With the present embodiments, available handwriting data included a collection of handwritten documents in various languages. In preparation for training, each document was segmented into a collection of small, binarized images. FIG. 7 shows example detected words and phrases outlined in boxes 702a-j used to build the training set.

Once the handwriting had been extracted, various versions were created using image warping routines to slant the image to the left and to the right.

權(quán)利要求

1
微信群二維碼
意見反饋