3

I am working on a project where I am doing OCR on text on a label. My job is to deskew the image to make it readable with tesseract.

this one

I have been using this approach, that greyscales and thresholds the picture, gets the coordinates of the black pixels, draws a minAreaRect around it and then corrects the skew by the skew angle of this rectangle. This works on blindtext images, but not on images with background, like the presented image. There, it calculates a skew angle of 0.0 and does not rotate the image. (Expected result: 17°)

black pixels in the background

I suspect this happens because there are black pixels in the background. Because of them the minAreaRect goes around the whole picture, thus leading to a skew angle of 0.

I tried doing a background removal, but couldn't find a method that works well enough so that only the label with the text is left

Another approach I tried was clustering the pixels through k-means-clustering. But even when choosing a good k manually, the cluster with the text still contains parts of the background.

See here.

Not to mention that I would still need another method that goes through all the clusters and uses some sort of heuristic to determine which cluster is text and which is background, which would cost a lot of runtime.

What is the best way to deskew an image that has background?

Imogenio
  • 31
  • 3
  • 2
    Will those labels always have that QR code there? – Dan Mašek Jan 12 '22 at 22:03
  • Can't you just threshold on white to get the label. Then get the region around the text inside the label or just deskew that label after making the background also white using the code from your reference. – fmw42 Jan 13 '22 at 00:57
  • I don't understand what you mean by thresholding on white, can you elaborate on that? From what I gathered, your suggestion is to perform the minAreaRect on the white pixels in order to get the label. However, as you can see in the thresholded image, there is also a lot of white background at the top of the image which would make the results useless – Imogenio Jan 13 '22 at 09:11
  • 2
    perhaps throw a text *detection* method at this first. -- I would also like to know if those QR codes are always there. you didn't answer that question yet. -- what is "blindtext"? -- why is this image cropped so close? the label's corners are cut off. – Christoph Rackwitz Jan 13 '22 at 09:32
  • Unfortunately, tesseract text detection only works if the text is in the correct skew. Or is there a text localization method that also works on skewed text? I researched a bit into that but could not find anything – Imogenio Jan 13 '22 at 09:53
  • @DanMašek No, not necessarily. – Imogenio Jan 13 '22 at 14:09

2 Answers2

4

You can try deep learning based natural scene text detection methods. With these methods you can get rotated bounding boxes for each text. Based on these get rotated bounding rectangle covering all boxes. Then use the 4 corners of that rectangle to correct the image.

RRPN_plusplus

Based on sample image RRPN_plusplus seems to do quite well on extreme angles.

enter image description here

EAST

Pyimagesearch has a tutorial with EAST scene text detector. Though not sure east will do good with extreme angles.

https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/

enter image description here

Image from, https://github.com/argman/EAST.

These should help you find recent better repos and methods,

B200011011
  • 3,798
  • 22
  • 33
0

You could use a fast cross platform command like

deskew32 -o out1.png -a 20 -f b1 -g c Sdgqm.png

enter image description hereenter image description here

Or for more complex cases combine with dewarp but will need a third step as the auto thresholding is not upper and lower dewarping mmrnt.png square.png 0 0

enter image description here enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36