0

I want to extract table information from OCR data, I have raw text and it's text. I tried pytesseract but couldn't find the actual Implementation.

Here is an image: https://drive.google.com/open?id=1CGJwbmf5snoXvwlQAsRAxIRRixbT_Q8l

I tried this: https://github.com/WZBSocialScienceCenter/pdftabextract

this method didn't work for me at all.

I want a tabular structure of this table from OCR data for my further processing.

1 Answers1

0

pdftabextract is not an OCR. It requires scanned pages with OCR information, i.e. a "sandwich PDF" that contains both the scanned images and the recognized text. You need software like tesseract or ABBYY Finereader for OCR.

Please try tesseract it has a relatively easier implementation.

sarath s
  • 3
  • 1
  • 3