Extract only the handwriting text from a pdf using OCR

Question

I am working OCR handwriting to text conversion using Google cloud vision api.

Input is a pdf with 1-5 pages, but the catch is the each page of pdf can have default header and footer printed on it and in between that, an answer would be written by the student.

I am doing this in nodejs but open to suggestions if I can do this with accuracy.

Now the issue is, google OCR is converting everything into the plain text without telling which one was printed text vs handwriting.

Any which way I can achieve that?

One condition - the headers and footers are unknown, can be different for each pdf uploaded. They can also be of different sizes, so can't remove from the pdf automatically. — Lakshay Nagpal, May 09 '23 at 19:51

score 0 · Answer 1 · answered May 12 '23 at 17:44

A feature request has been filed regarding the Google Cloud Vision API's DOCUMENT_TEXT_DETECTION whether the text is handwritten or typed/printed. This feature has been anticipated by many users. By staying updated, you can follow the link of this feature request or always take a look at release notes.

There are other ways to achieve it by using other 3rd party tools such as ABBYY, Nanonets OCR API, Kofax, etc.

Extract only the handwriting text from a pdf using OCR

1 Answers1