-1

I have a PDF document with 388 pages and 1 table per page , i am trying to get them converted to excel or multiple dataframes, but having some difficulties, i have tried pypdf2 and tabula libraries but it stops after extracting only one page. The data looks like this: enter image description here

All pages are the same but with different industry name and numbers

so far the best results i got are with

import tabula
import pandas as pd

df= pd.DataFrame()
df = tabula.read_pdf("FSA.pdf",multiple_tables=True)

tabula.convert_into("FSA.pdf", "fsa_report.csv", output_format="csv",multiple_tables=True)
print(df)

But it stops after completing page 1.Any help?

Community
  • 1
  • 1
Equan Ur Rehman
  • 229
  • 1
  • 2
  • 11

1 Answers1

2
df = tabula.read_pdf(file, lattice=True, pages=2, multiple_tables=True)
tabula.convert_into(file, "fsa_report.csv", output_format="csv", pages=3, multiple_tables=True)

Use this line,You need to mentioned page count

PrakashT
  • 883
  • 1
  • 7
  • 17