3

I am having a problem while trying to extract data from receipts/bills: I'm using a ready to use API for extracting text from images. The text extracted is in French and does not follow a certain order (top-down or left-right), I already extracted a few information like dates, company name, total amount after tax and currency.

I'm having a problem when extracting different percentages of tax as well as tax amounts and total amount before tax. So far I managed to get a list of all the amounts present in a document, but I still can't differentiate between tax amounts, unit prices, total tax, etc.. The only information I have is that the largest amount extracted is always the total amount after tax. Can anyone help me out figure a solution to extract tax percentages as well as tax amounts ? I've put an example here (in French, but same for English). Amounts extracted from this picture are:

0.0, 11.59, 18.55, 22.0, 55.0, 289.25, 350.0, 491.58, 780.83,
1391.25, 1446.25, 1958.0, 2000.0, 2607.75, 4915.75, 5142.83, 6362.0, 7142.83

What I want to get is:

'5.5%':  0.00
'10%':  491.58
'20%':  289.25
'total tax':  789.83
'total before tax':  6362.00

PS: I've tried to extract tables from image in order to get a more structured text, no results worth to mention (and not all bills contain tables with vertical and horizontal lines)

0 Answers0