0

I wish to extract key-value pairs from the following image that consists of 2 invoices.

Image example Click here for image

I am using AWS Textract to achieve this however I'd like to be able to map the key-value pairs back to the invoices. For ex- 'Cornbread SVC' should be mapped to bill #1 and '1 #1 CHKN PLATE' should be mapped to bill #2.

One approach I thought was to perform some pre-processing on the image in which if we could find out the no. of bills and their coordinates then crop the image as per the dimensions. So basically '5' bills on an image would yield the coordinates of '5' bills and then take the original image and crop it 5 times as per the different bill dimensions. And then send each bill as a separate image to AWS Textract.

However, I have not been to able to figure out a method to detect the no. of bills in an image and it's boundary coordinates.

Any help would be appreciated. I am open to using any other APIs or methods to achieve this.

enamoria
  • 896
  • 2
  • 11
  • 29

1 Answers1

0

As you've already mentioned it would be necessary to split bills before you do any OCR. There are some techniques to achieve this.

You could use OpenCV and detect white paper in the image, see. From my experiences, I can tell you that it will work when the background of an image is dark enough. It won't work when you will take a picture at, for example, a white table. Therefore user experience achieved with this approach won't be satisfying - sometimes it works, sometimes it doesn't.

If it is a mobile app, you could ask your user to draw a rectangle around each receipt. A similar approach for a single document is used in mobile scanners, example.

The last option, which I prefer, is to use scanning app/SDK and force a user to simply take pictures of a single receipt. It may sound a bit rigid and uncool, but it works all the time. Let's face it - more steps that you have with a chance of failure, more failures will happen. In the invoice data extraction process you have at least the following steps:

  • image capture
  • image processing
  • OCR - not 100% accurate
  • recognition of data (what is invoice number, etc.) - not 100% accurate

At least, you have two steps that are not 100%. Why adding a new step that cannot work in 100% cases while it can achieve the same feature by taking separate images?

Jan Giacomelli
  • 1,299
  • 11
  • 23