白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Handwriting detector, extractor, and language classifier

專利號
US11176361B2
公開日期
2021-11-16
申請人
Raytheon Company(US MA Waltham)
發(fā)明人
Darrell L. Young; Kevin C. Holley
IPC分類
G06F40/171; G06F40/263; G06K9/00; G06K9/34; G06K9/38; G06K9/62; G06K9/68; G06K9/72
技術(shù)領(lǐng)域
language,may,or,in,bounding,be,hardware,features,geometric,image
地域: MA MA Waltham

摘要

Disclosed are methods for handwriting recognition. In some aspects, an image representing a page of a sample document is analyzed to identify a region having indications of handwriting. The region is analyzed to determine frequencies of a plurality of geometric features within the region. The frequencies may be compared to profiles or histograms of known language types, to determine if there are similarities between the frequencies in the sample document relative to those of the known language types. In some aspects, machine learning may be used to characterize the document as a particular language type based on the frequencies of the geometric features.

說明書

Once each feature was detected and encoded into a number, the language classification process could start using the encoded numbers. In an embodiment, one approach was based on the successful Kavnar and Trenkel technique used on characters, not handwriting, where histograms of n-grams are formed to create a language profile. An n-gram is an occurrence of two-features together. The letters ‘th’ are the most common character bi-gram in English. The language profile vector of n-gram normalized counts is developed during training and stored for each language. During testing, n-gram profile test vectors of the test document are compared to the stored profile vectors. The “closest match” is the reported language. There have been multiple proposals for measuring the distance between the profile vector and the test vector.

In various embodiments, n-grams were formed using the feature numbers. A profile n-gram histogram vector was created for each language during training. N-gram test vectors were compared to the profile n-gram histogram vector during testing to estimate the language by choosing the profile vector that is the best match to the test vector.

Various experiments showed this was a viable technique which could learn a language profile and match the language profile against features extracted from never-before-seen data. This technique may involve coding the individual feature detectors which may be complex.

權(quán)利要求

1
微信群二維碼
意見反饋