Unable retrieve dataframe in CSV format using python

Asked Dec 06 '21 at 13:04

Active Dec 06 '21 at 13:04

Viewed 108 times

I want to convert PDF file into CSV. For which I am using Tabula-py. However the output CSV is containing column names not its contents. Please guide tell me what am I missing and how can I save the data frame into a CSV file so that the entire data will be retrieved in the CSV file.

#!/usr/bin/env python3
import tabula
import pandas as pd
import csv

pdf_file='document-page1.pdf'
column_names=['Product','Batch No','Machin No','Time','Date','Drum/Bag No','Tare Wt.kg','Gross Wt.kg',
              'Net Wt.kg','Blender','Remarks','Operator']

# Page 1 processing
df1 = tabula.read_pdf(pdf_file, pages=1,area=(95,20, 800, 840),columns=[93,180,220,252,310,315,333,367,
                                                                      410,450,480,520]
                     ,pandas_options={'header': None}) #(top,left,bottom,right)

df1[0]=df1[0].drop(columns=5)
df1[0].columns=column_names
#df1[0].head(2)

df1[0].to_csv('result.csv')

asked Dec 06 '21 at 13:04

linux01

1

can you share your pdf? – Fatemeh Sangin Dec 06 '21 at 13:12
Kindly share PDFs to look further, if you can't share here, please share some on my Email ID, if you don't mind. – Kathan Thakkar Dec 10 '21 at 06:40

Unable retrieve dataframe in CSV format using python

0 Answers0