0

I have a pdf file called Question.pdf, and its content is as follows.

Question.pdf

I am converting my pdf file to an xlsx file using the python tabula module. However, it writes all the data in the 1st column of my excel file, how can I delete this field? (the part indicated in the red area)

data.xlsx

import tabula
df = tabula.read_pdf('Question.pdf', pages=1, lattice=True)[1]

df.columns = df.columns.str.replace('\r', ' ')
data = df.dropna()
data.to_excel('data.xlsx', index=False)
Yunus Emre
  • 25
  • 4

1 Answers1

0

Try this while exporting;

data.to_excel('data.xlsx', index=False, header=None)

Hope this Helps...

Sachin Kohli
  • 1,956
  • 1
  • 1
  • 6
  • Glad to Help... Drop a like or accept the best answer that works... To grow & motivate community... Happy Coding :) – Sachin Kohli Sep 27 '22 at 17:22