0

I tried to convert pdf document (includes tables) into csv file. Unfortunately I failed. I have used the following approaches:

  1. Used pdfminer first converted the pdf to text but structure of text file was not same as of pdf file .

  2. Used pypdf2 first converted the pdf to text but structure of text file was not same as of pdf file.

  3. Used pdftotext first converted the pdf to text but structure of text file was not same as of pdf file.

  4. Used slate first converted the pdf to text but structure of text file was not same as of pdf file.

Kindly tell me the appropriate way to convert pdf to csv file. Some people have recommended me to parse the document to xml file and then to csv file. Even then I did not got the solution.

The PDF document looks as follows:

Image of PDF document is here

Are there any better tools which can convert pdf document (includes complex tables) to csv file?

Solutions in Python language would be highly appreciated.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
Umair.P
  • 1
  • 1

1 Answers1

0

Might be worth giving PDFTables a try, they've got a Python library/API for PDF to CSV conversions, and you get free pages to try it out.

tristanojbacon
  • 446
  • 1
  • 8
  • 22