3

I'm using ImageMagick to pre-process the receipt image before using tesseract-OCR engine to extract texts. I've removed noise form the image using

convert input.png -colorspace gray \
  \( +clone -blur 0x2 \) +swap -compose divide -composite \
  -linear-stretch 5%x0%   photocopy.png

Now, I need to crop out the area with the texts. ImageMagick has masking feature to remove border form the image but in my case creating mask does not seem to be working due to un-uniform backgrounds of receipt images.

I've gone through SWT 'Stroke Width Transform' to identfy texts in natural images' from here Can this be obtained through imagemagick (may be other handy developer image processing tool)to identify text so that borders can be omitted ? Thanks in advance.

mane
  • 1,149
  • 16
  • 41
Sanjay Sharma
  • 232
  • 1
  • 9

0 Answers0