1

I want to extract useful information from images of the bills.

I have already converted images to text using OCR + pytesseract and extracting the information based on specific words like total, amount, etc.

What will be the best generic approach for handling various types of unstructured bills to extract Place of the bill and amount?

Sourabh Potnis
  • 1,431
  • 1
  • 17
  • 26
  • could you please provide some images? – flamelite Feb 05 '18 at 10:08
  • Sample image: https://media-cdn.tripadvisor.com/media/photo-s/0b/4a/df/b0/receipt-for-our-meal.jpg – Sourabh Potnis Feb 05 '18 at 10:15
  • I think you might have succesfully extracted bill in this image, but in unstructered images you can use regex matching to filter bill data based on currency symbol, and digits. – flamelite Feb 05 '18 at 10:17
  • Yes, I am extracting with Regex. But my challenge is handling poor quality images and logos where OCR fails to convert to text correctly. – Sourabh Potnis Feb 05 '18 at 10:25
  • if you have blurred image where pixel data is lost or occluded then you can not get that lost information any way. – flamelite Feb 05 '18 at 10:28
  • By poor quality, I meant OCR fails to extract information but images can be read and data can extracted by humans easily. – Sourabh Potnis Feb 05 '18 at 10:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/164546/discussion-between-flamelite-and-sourabh-potnis). – flamelite Feb 05 '18 at 10:32

1 Answers1

0

In case of unstructured, rotated, translated, light variant image recognition, it is best to go for deep learning models.

Initially train your network with say n images which would contain different kinds of variations.

Ex Lets say we need to find the location of the word "amount":

image 1: Image of a bill with the word "amount" placed at top-right corner

image 2: Image of a bill with the word "amount" placed at top-left corner

....

These will be your input training samples. Your output will be the location coordinates of the word "amount".

You can check this link to learn more about creating a deep learning model.

janu777
  • 1,940
  • 11
  • 26