Google Vision API - Split OCR Results to Different Lines?

Question

I'm trying to use the Google Vision API in C# for an image with text on multiple lines. I want each line to be a separate string, but the API puts it all into 1 string.

I tried filtering by capitals at the beginning, but some lines have capitals at the beginning of each word, so it's not always just at the beginning of each line.

How can I change it so that it takes in each line separately? Since all the lines are in the same place in the image each time, could I crop it using C# to get each line individually?

Thanks :)

Do you have a sample image and snippet of code you are calling? — Kevin Le, Feb 07 '18 at 19:33
@KevinLe It's pretty simple, here's a simple pic I made in MS Paint: https://imgur.com/a/MtAqx - That's basically what it looks like, and I need the top phrase on multiple lines to be 1 string, and each of the 3 lines below to be separate strings as well. Here's what I'm doing currently: https://hastebin.com/ikiyuwatuk.cs — NateDev, Feb 07 '18 at 21:13
Looks like you were going with this documentation https://cloud.google.com/vision/docs/detecting-text rather than this one https://cloud.google.com/vision/docs/detecting-fulltext. I posted an answer below! — Kevin Le, Feb 07 '18 at 21:29

Kevin Le · Answer 1 · 2018-02-07T23:36:43.210

It looks like you are using the "TEXT_DETECTION" mode rather than the "DOCUMENT_TEXT_DETECTION" mode of the Google Vision API.

https://cloud.google.com/vision/docs/ocr This specifies the the differences between the two.

From https://cloud.google.com/vision/docs/detecting-fulltext

This is what your code should look like if you are using the "DOCUMENT_TEXT_DETECTION" API.

var image = Image.FromFile(filePath);
var client = ImageAnnotatorClient.Create();
var response = client.DetectDocumentText(image);
foreach (var page in response.Pages)
{
    foreach (var block in page.Blocks)
    {
        foreach (var paragraph in block.Paragraphs)
        {
            Console.WriteLine(string.Join("\n", paragraph.Words));
        }
    }
}

Hope that helps!

Edit

I did a POST https://vision.googleapis.com/v1/images:annotate?key=[API_KEY] with the body

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://i.imgur.com/5t34img.png"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

and received this response valid response. https://gist.github.com/kle622/02d4d573c2c8bc08beac25a26b81096e I can help more if you post your updated code :)

So that gets me each letter, but how do I get separate strings? For example on the test image I had, I get this: https://hastebin.com/citesovuwa.pl where I can see the text elements, but then how do I separate the 4 statements? (The 1 at the top on multiple lines and then the 3 options below) My apologies if I'm missing something obvious here haha. @KevinLe — NateDev, Feb 07 '18 at 22:13
Updated the answer with my exact call to the API and response. Really odd you are getting that, I seem to be getting what you want in my response. Maybe update your code to what you have now? @NateDev — Kevin Le, Feb 07 '18 at 23:37

Google Vision API - Split OCR Results to Different Lines?

1 Answers1