I am new to pandas. I have a PDF data table which I read and extract the data from, converting it into an Excel file. This is a short, fictitious example of the table I am using, however the structure of the PDF is exactly the same:
After converting the PDF to Excel, I get this: The table is complete and the conversion was successfully completed. excel file
Notice that each word within the single quotes, are a column within my original PDF. The commas, are some empty columns that are in the PDF.
I would like to know how I can turn those words into columns again, using pandas? I want to get a result similar to this: final excel
My simple code is below:
import pdfplumber
import pandas as pd
from pandas import ExcelWriter
with pdfplumber.open('Sistema BanPara.pdf') as pdf:
df = pd.DataFrame(page.extract_table() for page in pdf.pages).astype(str)
with pd.ExcelWriter('table.xlsx') as writer:
df.to_excel(writer)