Conversion pdf document which includes tables to csv file using python or any other langaue

Question

I tried to convert pdf document (includes tables) into csv file. Unfortunately I failed. I have used the following approaches:

Used pdfminer first converted the pdf to text but structure of text file was not same as of pdf file .
Used pypdf2 first converted the pdf to text but structure of text file was not same as of pdf file.
Used pdftotext first converted the pdf to text but structure of text file was not same as of pdf file.
Used slate first converted the pdf to text but structure of text file was not same as of pdf file.

Kindly tell me the appropriate way to convert pdf to csv file. Some people have recommended me to parse the document to xml file and then to csv file. Even then I did not got the solution.

The PDF document looks as follows:

Are there any better tools which can convert pdf document (includes complex tables) to csv file?

Solutions in Python language would be highly appreciated.

Could you link a page of the document; the image doesn't tell me quite enough about the formatting? — Ari Cooper-Davis, Mar 31 '17 at 08:06

score 0 · Answer 1 · answered Mar 31 '17 at 22:00

0

Might be worth giving PDFTables a try, they've got a Python library/API for PDF to CSV conversions, and you get free pages to try it out.

answered Mar 31 '17 at 22:00

tristanojbacon

446
1
8
22

Conversion pdf document which includes tables to csv file using python or any other langaue

1 Answers1