I want to extract table information from OCR Data

Question

I want to extract table information from OCR data, I have raw text and it's text. I tried pytesseract but couldn't find the actual Implementation.

Here is an image: https://drive.google.com/open?id=1CGJwbmf5snoXvwlQAsRAxIRRixbT_Q8l

I tried this: https://github.com/WZBSocialScienceCenter/pdftabextract

this method didn't work for me at all.

I want a tabular structure of this table from OCR data for my further processing.

score 0 · Answer 1 · answered Jan 20 '19 at 05:29

pdftabextract is not an OCR. It requires scanned pages with OCR information, i.e. a "sandwich PDF" that contains both the scanned images and the recognized text. You need software like tesseract or ABBYY Finereader for OCR.

Please try tesseract it has a relatively easier implementation.

I want to extract table information from OCR Data

1 Answers1