3

I am currently working on scanning invoices with OCR scanning. All invoices use the "OCRB" font, and have the same formatting.

The bottom of a sample invoice looks like this

enter image description here

This is what the user needs to scan.

I have tried many different libraries to detect what I want. But most libraries doesn't give me the correct result. The best result came from Firebase ML Vision text recognition. But the resulting output I get is this:

enter image description here

I can calculate if the values are correct, except for the amount, presented in the middle. In this case it's presented as "3557 00" but if the user moves the camera a bit further to the right, the result I get is "557 00". Since both MLKit and other libraries cuts around the word, I have no idea if the full sum is presented or not.

If I would get a single space before the word, I could get that there is a full "word", in this case a sum.

Anyone has any ideas of how what library to use to get the best result?

Giovanni Palusa
  • 1,197
  • 1
  • 16
  • 36
  • 1
    Asking for libraries is off-topic and your question will get closed by the community. The problem probably isn't what library you are using anyway, it can detect the text, but at certain angles it loses accuracy, they will likely all do this. You probably need to improve your logic. For example, you could add additional logic to check that you can see the edges of the page, that you detect 5 'blocks' at the bottom, that the edges of the page indicate that the camera is pointing directly at the page instead of at an angle etc etc. Way too much to explain here – Scriptable May 10 '19 at 08:50

0 Answers0