I am trying to create an OCR based solution which can extract data from Invoice images and pdfs and make JSON file from it. I am currently using Tesseract OCR to extract data from Invoices but getting very low accuracy so should i work on training Tesseract or use some other OCR. And even by doing all this can i get super high accuracy(~95%) because in processing invoice very high precession is required.
I tried using Tesseract OCR engine to extract data from invoices image and i also implemented image pre-processing techniques like Binarisation, Thresholding, Orientation and GaussianBlur to improve accuracy but that also didn't helped much