FIG. 6 illustrates detected regions of images and handwriting in a document 600 according to various embodiments. The document 600 is a page from a travel article about Washington D.C. Various embodiments use a trained deep CNN to recognize handwriting, machine printed text, and images. In an embodiment, the deep CNN is trained using a training set of printed text, handwriting, and images. The boxes 602a-d in the document 600 show handwriting detected by the deep CNN. The boxes 604a-f are detected text. The green boxes 606 are detected images. In an example, the document 600 may be represented as an image. The image may be provided as input to the deep CNN. The deep CNN may then detect images inside the overall document image. Similar to the line cut problem, image 610 may be falsely detected a handwriting. In some embodiments, a configuration option is provided that selects whether to continue to process candidate handwriting on top of detected images. For example, via the configuration option, an embodiment may be configured to ignore or alternatively to process handwriting within an image.
One challenge associated with machine learning is training set preparation. With the present embodiments, available handwriting data included a collection of handwritten documents in various languages. In preparation for training, each document was segmented into a collection of small, binarized images. FIG. 7 shows example detected words and phrases outlined in boxes 702a-j used to build the training set.
Once the handwriting had been extracted, various versions were created using image warping routines to slant the image to the left and to the right.