Extracting pdf tables using Tabula-py, It's extracting all rows but not splitting it right. Taken the sample pdf below to extract.
tried extraction with below code
import tabula
import json
import pandas as pd
path = "/GST_OCR input Pdfs/gst3.pdf"
col2str = {'dtype': str}
kwargs = {
"multiple_tables":True,
'pandas_options': col2str,
'lattice':False,
'guess':False
}
csv_data = tabula.read_pdf(path, pages="all",**kwargs)
# with pd.ExcelWriter(csv_data[1].iloc[0,1]+".xls", engine='xlsxwriter') as writer:
# for i in range(len(csv_data)):
# csv_data[i].to_excel(writer, sheet_name=f'Sheet {i+1}')
csv_data[5]
it's not extracting rows properly, instead of that it's creating unnamed columns.'
Extracting like this
Help me regarding this. Thanks in advance