Can the Google Vision API be directed to detect single characters only, or non-English strings?

Question

For example, I'd like to detect a coded string like "A5b1x" written in handwriting. So I'd either split it up manually so that I have an image of each character, or try to have Vision recognize it immediately. Neither is working for now, as I'm not sure how to specify that it's not a language (or specify that it's singular characters). This is what I typed in a Google compute instance:

gcloud ml vision detect-document "weblink to image"

No result for image of "g": g

No result for image of "e": e

Result for image of "fxb3":fxb3

{
  "responses": [
    {
      "fullTextAnnotation": {
        "pages": [
          {
            "blocks": [
              {
                "blockType": "TEXT",
                "boundingBox": {
                  "vertices": [
                    {
                      "x": 2433,
                      "y": 1289
                    },
                    {
                      "x": 1498,
                      "y": 1336
                    },
                    {
                      "x": 1468,
                      "y": 737
                    },
                    {
                      "x": 2403,
                      "y": 691
                    }
                  ]
                },
                "confidence": 0.56,
                "paragraphs": [
                  {
                    "boundingBox": {
                      "vertices": [
                        {
                          "x": 2433,
                          "y": 1289
                        },
                        {
                          "x": 1498,
                          "y": 1336
                        },
                        {
                          "x": 1468,
                          "y": 737
                        },
                        {
                          "x": 2403,
                          "y": 691
                        }
                      ]
                    },
                    "confidence": 0.56,
                    "words": [
                      {
                        "boundingBox": {
                          "vertices": [
                            {
                              "x": 2433,
                              "y": 1289
                            },
                            {
                              "x": 1498,
                              "y": 1336
                            },
                            {
                              "x": 1468,
                              "y": 737
                            },
                            {
                              "x": 2403,
                              "y": 691
                            }
                          ]
                        },
                        "confidence": 0.56,
                        "symbols": [
                          {
                            "boundingBox": {
                              "vertices": [
                                {
                                  "x": 2433,
                                  "y": 1289
                                },
                                {
                                  "x": 2135,
                                  "y": 1304
                                },
                                {
                                  "x": 2105,
                                  "y": 706
                                },
                                {
                                  "x": 2403,
                                  "y": 691
                                }
                              ]
                            },
                            "confidence": 0.4,
                            "text": "\u0967"
                          },
                          {
                            "boundingBox": {
                              "vertices": [
                                {
                                  "x": 2063,
                                  "y": 1308
                                },
                                {
                                  "x": 1788,
                                  "y": 1322
                                },
                                {
                                  "x": 1758,
                                  "y": 723
                                },
                                {
                                  "x": 2033,
                                  "y": 710
                                }
                              ]
                            },
                            "confidence": 0.62,
                            "text": "\u0967"
                          },
                          {
                            "boundingBox": {
                              "vertices": [
                                {
                                  "x": 1750,
                                  "y": 1323
                                },
                                {
                                  "x": 1498,
                                  "y": 1336
                                },
                                {
                                  "x": 1468,
                                  "y": 737
                                },
                                {
                                  "x": 1720,
                                  "y": 725
                                }
                              ]
                            },
                            "confidence": 0.67,
                            "property": {
                              "detectedBreak": {
                                "type": "LINE_BREAK"
                              }
                            },
                            "text": "X"
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ],
            "height": 2112,
            "width": 4608
          }
        ],
        "text": "\u0967\u0967X\n"
      },
      "textAnnotations": [
        {
          "boundingPoly": {
            "vertices": [
              {
                "x": 1467,
                "y": 690
              },
              {
                "x": 2432,
                "y": 690
              },
              {
                "x": 2432,
                "y": 1335
              },
              {
                "x": 1467,
                "y": 1335
              }
            ]
          },
          "description": "\u0967\u0967X\n",
          "locale": "und"
        },
        {
          "boundingPoly": {
            "vertices": [
              {
                "x": 2433,
                "y": 1289
              },
              {
                "x": 1498,
                "y": 1336
              },
              {
                "x": 1468,
                "y": 737
              },
              {
                "x": 2403,
                "y": 691
              }
            ]
          },
          "description": "\u0967\u0967X"
        }
      ]
    }
  ]
}

Can you explain what response you are getting when trying to detect "A5b1x" ? Are you using DOCUMENT_TEXT_DETECTION when calling the Vision API? Can you provide a snippet of the code you are using and an image that you are passing for OCR so that we could have an idea where the issue might be? — Philipp Sh, Jan 16 '19 at 10:43
Thanks, added some images with the line I used. No results for any — Rithwik Sudharsan, Jan 17 '19 at 19:49

score 0 · Answer 1 · answered Jan 18 '19 at 12:42

The Google Cloud Vision API is not able to recognise single characters at this point. There is a feature request submitted with regard to character recognition here. Please star it so that you could receive updates about this feature request and do not hesitate to add additional comments to provide details of the desired implementation.

With respect to your question about recognising "coded" strings, the Vision API is able to do that. I have successfully tried to pass an image with fxb3 to the API and the results were good (here is image1 and image2). The response you are getting from the API is two consecutive unicode characters and "x". The quality of the writing is what is causing the response to be quite poor. The model for OCR is constantly being improved, but at this point it cannot properly detect what might be considered rather unclear handwriting.

Understood, it is in beta I think so that makes sense. I did see those Unicode results, if I give the API a hint that the language is EN would it use only English unicode characters, improving the accuracy? — Rithwik Sudharsan, Jan 18 '19 at 15:49
I have tried with and without the use of the languageHints parameter, and there certainly is a difference. Setting English should aid in most cases, but it all depends on the quality of the handwriting, and on some occasions it might even provide worse results when specifying English for languageHints. — Philipp Sh, Jan 29 '19 at 08:35

Can the Google Vision API be directed to detect single characters only, or non-English strings?

1 Answers1