Is there a way to predict document title from Google Cloud Vision OCR?

Question

What I need help with is a way to predict document title from the OCR text which Google Cloud Vision extracts from a pdf/jpg file.

I have a jpg file which I am sending to Vision API and I get the OCR text. For the image attached, how could I programmatically predict that the title of the document is, "Piano Posture Checklist"?

score 1 · Answer 1 · answered Sep 22 '21 at 02:00

The response you get when detecting text using Vision API (TextAnnotation) is structured like TextAnnotation -> Page -> Block (text block, table block, etc.) -> Paragraph -> Word -> Symbol. Additional properties for these are the detected language, detected break (space, hyphen, line break) only. Thus Vision API is not capable to predict as specific as the "Title" of the document. See TextAnnotation reference.

If you want to predict as specific as "Title" in a document/image. I suggest to use AutoML Vision where you can train a model that will predict the "Title", given a set of documents/images that are properly labeled. Once training is done, you can send a prediction request to predict the "Title".

You can refer to this document for an example on how to prepare a dataset, train a model and predict.

Thanks for the suggestion! – Smijo Thekkudan Oct 25 '21 at 20:28 — Smijo Thekkudan, Oct 25 '21 at 20:28

wescpy · Answer 2 · 2021-11-11T00:22:28.650

You want to "predict document title." There are 2 possible scenarios here:

Either you want to predict the correct document title based on the title itself appearing somewhere in the document, or
You want to predict the title based on the (OCR'd) contents because the document didn't/doesn't come with a title.

For #1, I agree w/the response from Ricco: you should build a custom version of the Cloud Vision API just for your application, IOW tweaking the model using AutoML (well, AutoML Vision) to suit your needs, e.g., getting the title out of an OCR doc, whether it's looking for title placement/location, font size, etc.

More advanced is #2. You would probably have to use a pair of APIs... OCR with Cloud Vision (w/or w/o AutoML) then analyzing the text using NLU via Cloud Natural Language (or AutoML Natural Language if needed) to possibly autogenerate a title based on its contents if a document didn't come w/one. I believe in this case your training will likely have to lean towards supervised learning where you're providing titles paired w/untitled documents in your training data.

I am looking for the second case you mentioned. Thanks for the suggestions! I will try it out. — Smijo Thekkudan, Oct 25 '21 at 20:29

Is there a way to predict document title from Google Cloud Vision OCR?

2 Answers2