0

I have a pdf which has data in tabular format and has 6 columns but the columns are not separated by boundaries so when I extract the data using pdfplumber, all the data comes in one cell only and I want in separate cells.

How could I do that?

For your reference:

15/03/2021 RTGS-UTIBR52021031300662458-VIRENDER KUMAR 2,60,635.00 2,94,873.94Cr 11/03/2021 IMPS/P2A/107018040382/XXXXXXXXXX0980/trf 49,500.00 34,238.94Cr 11/03/2021 IMPS/P2A/107018771795/KINGDOMHOTELAND/trf 35,000.00 83,738.94Cr

Thanks in advance

arvin
  • 9
  • 4

1 Answers1

0

You can use the extract_tables() method the get the tables into the Data frame.

Here I can just mention the code for the 0th page.use the for loop to extract table from the all the pages.

import pdfplumber
path = file_path

pdf = pdfplumber.open(path,password="")
table = pd.DataFrame(pdf.pages[0].extract_tables())

Change the code as per your requirements.

jainam shah
  • 199
  • 1
  • 11