Is it possible to use both bounding boxes and block containers in Google Cloud Vision OCR?

Question

My goal is to use google cloud vision to identify handwritten paragraphs adn their bounding boxes.

I am trying to use the feature from Google's cloud vision "Document Text Detection" API where you can find handwritten text split into blocks, paragraphs, words, and symbols. This is very doable. Here is a link to the documentation

Likewise, I can use Google's "Text Detection" API to locate coordinates for a bounding box. Here is a link to the documentation

My problem is these two branches are not compatible, and i cannot run both at once (i am looking to find a bounding box around a paragraph).

** Does anyone know how to do bounding boxes with "Document Text Detection" or block containers on "Text Detection" for Google Cloud Vision?**

However, I can't seem to manage to use these two features simultaneously (using python FYI).

Cheers. Any help is appreciated.

score 0 · Answer 1 · answered Apr 07 '23 at 12:55

I don't if I understood correctly your doubt - please correct me if I got it wrong.

If I understand, you want to have the handwritten text detected together with bounding boxes identifying the text location into the image.

To do it, as you can quickly test on the Try It session on the Vision AI documentation, you simply need to handle the API response. You will get a response part like this:

"textAnnotations": [
        {
          "locale": "en",
          "description": "Google Claud\nPlatform",
          "boundingPoly": {
            "vertices": [
              {
                "x": 296,
                "y": 68
              },
              {
                "x": 712,
                "y": 68
              },
              {
                "x": 712,
                "y": 253
              },
              {
                "x": 296,
                "y": 253
              }
            ]
          }
        }

And checking the Python SDK documentation for Vision AI, those x's and y's build the polygon identifying where the text relies in.

Is it possible to use both bounding boxes and block containers in Google Cloud Vision OCR?

1 Answers1