白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Handwriting detector, extractor, and language classifier

專利號(hào)
US11176361B2
公開(kāi)日期
2021-11-16
申請(qǐng)人
Raytheon Company(US MA Waltham)
發(fā)明人
Darrell L. Young; Kevin C. Holley
IPC分類
G06F40/171; G06F40/263; G06K9/00; G06K9/34; G06K9/38; G06K9/62; G06K9/68; G06K9/72
技術(shù)領(lǐng)域
language,may,or,in,bounding,be,hardware,features,geometric,image
地域: MA MA Waltham

摘要

Disclosed are methods for handwriting recognition. In some aspects, an image representing a page of a sample document is analyzed to identify a region having indications of handwriting. The region is analyzed to determine frequencies of a plurality of geometric features within the region. The frequencies may be compared to profiles or histograms of known language types, to determine if there are similarities between the frequencies in the sample document relative to those of the known language types. In some aspects, machine learning may be used to characterize the document as a particular language type based on the frequencies of the geometric features.

說(shuō)明書(shū)

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No. 62/687,962, filed Jun. 21, 2018 and entitled “HANDWRITING DETECTOR, EXTRACTOR AND LANGUAGE CLASSIFIER.” The contents of this prior application are considered part of this application, and is hereby incorporated by reference in its entirety.

GOVERNMENT RIGHTS

This invention was made with Government support under government contract IS-FI-4382 awarded by the Combating Terrorism Technical Support Office. The Government has certain rights in this invention.

BACKGROUND

The written record is considered by historians as man's transition from pre-history. More importantly, handwriting (and accounting) enabled the further development of civilization with records such as agricultural yields, livestock, births, and land ownership, which in turn led to centralized management and the rise of cities. Despite the centrality of handwriting, modern information processing methods are challenged to correctly identify handwriting in all its forms. Thus, improved methods of handwriting recognition are desired.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 lists several techniques for handwriting detection and language determination;

FIG. 2 illustrates a simple process flow for language classification.

FIG. 3 illustrates the separation of the algorithmic and infrastructure.

FIG. 4 shows the results of the modified Sauvola method.

權(quán)利要求

1
What is claimed is:1. A method performed by hardware processing circuitry, comprising:receiving an image;identifying handwriting in the image by:detecting a plurality of features in the image;detecting a subset of the plurality of features arranged linearly in the image;detecting a region of the image bounding the subset of plurality of features;determining a probability that the region includes handwriting; anddetermining the probability is above a threshold resulting in identification of handwriting; andresponsive to the identification of handwriting, identifying a language type of the handwriting; andgenerating, based on the region, a plurality of geometric metrics defining frequencies of geometric features within the region, the geometric features including three or more of:line segments, boxes, curves, loops, orthogonal intersections, cross-overs, corners, closed curves, or connected curves;providing, to a trained model, the plurality of geometric metrics; anddetermining from the trained model, the language type of handwriting within the region.2. The method of claim 1, further comprising enhancing contrast of the image resulting in a contrast enhanced image, wherein the detecting of the plurality of features is based on the contrast enhanced image.3. The method of claim 2, further comprising color filtering the image to remove non-blue colors resulting in a color-filtered image, wherein the detecting of the plurality of features is based on the color-filtered image.4. The method of claim 1, wherein determining the probability that the region includes handwriting includes determining an irregularity of the subset of the plurality of features within the region, wherein the determination of the probability is based on the irregularity, the irregularity determined based on a radius of a first circle that encloses the region and a maximum radius of a second circle that is contained within the region.5. The method of claim 1, wherein the generating of the plurality of geometric metrics comprises determining a length and a height of the region, wherein a count of a geometric feature of the geometric features occurring along the length of the region is normalized based on the height and the length of the region, the method further comprising generating a frequency of the geometric feature based on the normalized count.6. The method of claim 1, wherein the geometric features include line segments, boxes, curves, loops, orthogonal intersections, cross-overs, corners, closed curves, and connected curves.7. The method of claim 1, further comprising generating second geometric metrics identifying frequencies of adjoining geometric feature pairs within the region, and providing the second geometric metrics to the trained model.8. The method of claim 1, further comprising training the model based on a database of documents, document metrics for the documents, and the language type of the document.9. A non-transitory computer readable storage medium comprising instructions that when executed configure hardware processing circuitry to perform operations, comprising:receiving an image;identifying handwriting in the image by:detecting a plurality of features in the image;detecting a subset of the plurality of features arranged linearly in the image;detecting a region of the image bounding the subset of the plurality of features;determining a probability that the region includes handwriting; anddetermining the probability is above a threshold resulting in identification of handwriting; andresponsive to the identification of handwriting, identifying a language of the handwriting by:generating, based on the region, a plurality of geometric metrics defining frequencies of geometric features within the region, the geometric features including three or more of:line segments, boxes, curves, loops, orthogonal intersections, cross-overs, corners, closed curves, or connected curves;providing, to a trained model, the plurality of geometric metrics; anddetermining from the trained model, the language type of handwriting within the region.10. The non-transitory computer readable storage medium of claim 9, the operations further comprising enhancing contrast of the image resulting in an enhanced contrast image, wherein the detecting of the plurality of features is based on the enhanced contrast image.11. The non-transitory computer readable storage medium of claim 9, the operations further comprising color filtering the image to remove non-blue blue colors resulting in a color-filtered image, wherein the detecting of the plurality of features is based on the color-filtered image.12. The non-transitory computer readable storage medium of claim 9, wherein determining the probability that the region includes handwriting includes determining an irregularity of the subset of the plurality of features within the region, wherein the determination of the probability is based on the irregularity, the irregularity determined based on a radius of a first circle that encloses the region and a maximum radius of a second circle that is contained within the region.13. The non-transitory computer readable storage medium of claim 9, wherein the generating of the plurality of geometric metrics comprises determining a length and a height of the region, wherein a count of a geometric feature of the geometric features occurring along the length of the region is normalized based on the height and the length of the region, the method further comprising generating a frequency of the geometric feature based on the normalized count.14. The non-transitory computer readable storage medium of claim 9, wherein the geometric features include line segments, boxes, curves, loops, orthogonal intersections, cross-overs, corners, closed curves, and connected curves.15. The non-transitory computer readable storage medium of claim 9, the operations further comprising generating second geometric metrics identifying frequencies of adjoining geometric feature pairs within the region, and providing the second geometric metrics to the trained model.16. The non-transitory computer readable storage medium of claim 9, the operations further comprising training the model based on a database of documents, document metrics for the documents, and the language type of the document.17. A system, comprising:hardware processing circuitry;one or more hardware memories storing instructions that when executed configure the hardware processing circuitry to perform operations comprising:receiving an image;identifying handwriting in the image by:detecting a plurality of features in the image;detecting a subset of the plurality of features arranged linearly in the image;detecting a region of the image bounding the subset of the plurality of features;determining a probability that the region includes handwriting; anddetermining the probability is above a threshold resulting in identification of handwriting; andresponsive to the identification of handwriting, identifying a language type of the handwriting; andgenerating, based on the region, a plurality of geometric metrics defining frequencies of geometric features within the region, the geometric features including three or more of:line segments, boxes, curves, loop, orthogonal intersections, cross-overs, corners, closed curves, or connected curves;providing, to a trained model, the plurality of geometric metrics; anddetermining from the trained model, the language type of handwriting within the region.18. The system of claim 17, wherein the generating of the plurality of geometric metrics comprises determining a length and a height of the region, wherein a count of a geometric feature of the geometric features occurring along the length of the region is normalized based on the height and the length of the region, the method further comprising generating a frequency of the geometric feature based on the normalized count.19. The system of claim 17, wherein the geometric features include line segments, boxes, curves, loops; orthogonal intersections, cross-overs, corners, closed curves, or connected curves.20. The system of claim 17, the operations further comprising generating second geometric metrics identifying frequencies of adjoining geometric feature pairs within the region, and providing the second geometric metrics to the trained model.
微信群二維碼
意見(jiàn)反饋