1

I have several examples of images I need to recognize with OCR.

I've tried to recognize them on the demo page https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ and it works quite well. I use the "Read text in images" option, which works even better than "Read handwritten text from images".

But when I try to use the REST call from a script (according to the example given in documentation) results are much worse. Some letters are recognized wrong, some are totally missed. If I try running the same example from the development console https://westcentralus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fc/console I still get the same bad results.

What can cause this difference? How can I fix it to get reliable results as the demo page produces?

Maybe any additional information is required?

UPD: since I couldn't find any solution or even explanation of the difference I've created a sample file (similar to actual files) so you can have a look. The file url is http://sfiles.herokuapp.com/sample.png

You can see, if it is used on the demo page https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ in section "Read text in images" the resulting JSON is

{
  "status": "Succeeded",
  "succeeded": true,
  "failed": false,
  "finished": true,
  "recognitionResult": {
    "lines": [
      {
        "boundingBox": [
          307,
          159,
          385,
          158,
          386,
          173,
          308,
          174
        ],
        "text": "October 2011",
        "words": [
          {
            "boundingBox": [
              308,
              160,
              357,
              160,
              357,
              174,
              308,
              175
            ],
            "text": "October"
          },
          {
            "boundingBox": [
              357,
              160,
              387,
              159,
              387,
              174,
              357,
              174
            ],
            "text": "2011"
          }
        ]
      },
      {
        "boundingBox": [
          426,
          157,
          519,
          158,
          519,
          173,
          425,
          172
        ],
        "text": "07UC14PII0244",
        "words": [
          {
            "boundingBox": [
              426,
              160,
              520,
              159,
              520,
              174,
              426,
              174
            ],
            "text": "07UC14PII0244"
          }
        ]
      }
    ]
  }
}

If I use this file in the console and make the following call:

POST https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=unk&detectOrientation =true HTTP/1.1
Host: westcentralus.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••

{"url":"http://sfiles.herokuapp.com/sample.png"}

I get different result:

{
  "language": "el",
  "textAngle": 0.0,
  "orientation": "Up",
  "regions": [{
    "boundingBox": "309,161,75,10",
    "lines": [{
      "boundingBox": "309,161,75,10",
      "words": [{
        "boundingBox": "309,161,46,10",
        "text": "October"
      }, {
        "boundingBox": "358,162,26,9",
        "text": "2011"
      }]
    }]
  }, {
    "boundingBox": "428,161,92,10",
    "lines": [{
      "boundingBox": "428,161,92,10",
      "words": [{
        "boundingBox": "428,161,92,10",
        "text": "071_lC14P110244"
      }]
    }]
  }]
}

As you see the result is totally different (even the JSON format). Does anyone know what am I doing wrong, or maybe I'm missing something, and the "Read text in images" demo does not match the ocr method of the API?

Will be very grateful for any help.

Anton
  • 95
  • 11
  • What does your REST call look like? Are the parameters in-line with what's sent by the web console? – Mo A Jan 15 '19 at 15:13
  • @MoA The console displays such a request preview: `POST https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=unk&detectOrientation =true HTTP/1.1 Host: westcentralus.api.cognitive.microsoft.com Content-Type: application/json Ocp-Apim-Subscription-Key: •••••••••••••••••••••••••••••••• {"url":"http://url.of/the/file.png"}` This exact file url is used on the demo page and works significantly better. The results are the same in console and in my script, I think if I make it work from the console the problem is solved. – Anton Jan 15 '19 at 15:18
  • @MoA I've just updated the question with sample image and sample calls, so maybe it can help to understand what's wrong? – Anton Jan 18 '19 at 16:08

1 Answers1

1

There are two flavors of OCR in Microsoft Cognitive Services. The newer endpoint (/recognizeText) has better recognition capabilities, but currently only supports English. The older endpoint (/ocr) has broader language coverage.

Some additional details about the differences are in this post.

cthrash
  • 2,938
  • 2
  • 11
  • 10