0

Tried to extract the below table using Tabula, but it was returning null dataframe. It was working fine for other kinds of similar tables.

enter image description here

Tried using Camelot as well but it didn't work as well. Any suggestions about how can I extract these?

Attached my code

from tabula import read_pdf 
from tabulate import tabulate
from tabula import read_pdf
import pandas as pd
# from tabula.io import read_pdf

Page_No = 1
tables = read_pdf('/content/page1.pdf',pages=Page_No,multiple_tables=True)
df1 = pd.DataFrame(tables[0])
df1
import camelot

tables2=camelot.read_pdf('page1.pdf', flavor='lattice', pages='1')
tables2
Pravin
  • 241
  • 2
  • 14

1 Answers1

0

The issue got fixed after adding flavor='stream' and 'guess=False' in tabula.

from tabula import read_pdf 
from tabulate import tabulate
from tabula import read_pdf
import pandas as pd
# from tabula.io import read_pdf

Page_No = 1
tables = read_pdf('/content/page1.pdf',pages=Page_No,guess=False,stream=True)
df1 = pd.DataFrame(tables[0])
df1

Pravin
  • 241
  • 2
  • 14