Converting PDF document to DataFrame

Question

I have a PDF document with 388 pages and 1 table per page , i am trying to get them converted to excel or multiple dataframes, but having some difficulties, i have tried pypdf2 and tabula libraries but it stops after extracting only one page. The data looks like this:

All pages are the same but with different industry name and numbers

so far the best results i got are with

import tabula
import pandas as pd

df= pd.DataFrame()
df = tabula.read_pdf("FSA.pdf",multiple_tables=True)

tabula.convert_into("FSA.pdf", "fsa_report.csv", output_format="csv",multiple_tables=True)
print(df)

But it stops after completing page 1.Any help?

Can you at least share the PDF? How do you expect anyone to run your program? — AMC, Dec 06 '19 at 06:13

score 2 · Accepted Answer · answered Dec 06 '19 at 06:28

2

df = tabula.read_pdf(file, lattice=True, pages=2, multiple_tables=True)
tabula.convert_into(file, "fsa_report.csv", output_format="csv", pages=3, multiple_tables=True)

Use this line,You need to mentioned page count

answered Dec 06 '19 at 06:28

PrakashT

883
1
7
17

Converting PDF document to DataFrame

All pages are the same but with different industry name and numbers

1 Answers1