Why is the data in the PDF written in the 1st column?

Question

I have a pdf file called Question.pdf, and its content is as follows.

I am converting my pdf file to an xlsx file using the python tabula module. However, it writes all the data in the 1st column of my excel file, how can I delete this field? (the part indicated in the red area)

data.xlsx

import tabula
df = tabula.read_pdf('Question.pdf', pages=1, lattice=True)[1]

df.columns = df.columns.str.replace('\r', ' ')
data = df.dropna()
data.to_excel('data.xlsx', index=False)

score 0 · Accepted Answer · answered Sep 27 '22 at 17:06

0

Try this while exporting;

data.to_excel('data.xlsx', index=False, header=None)

Hope this Helps...

answered Sep 27 '22 at 17:06

Sachin Kohli

1,956
1
1
6

Glad to Help... Drop a like or accept the best answer that works... To grow & motivate community... Happy Coding :) – Sachin Kohli Sep 27 '22 at 17:22

Why is the data in the PDF written in the 1st column?

1 Answers1