I have a fillable PDF that contains multiple tables and I want to convert it to a CSV file using Python.
When I try to open the PDF with anything other than Adobe Acrobat Reader I get this message.
What I have tried so far:
import PyPDF2
import fitz
import csv
from tkinter import Tk
from tkinter.filedialog import askopenfilename
# Prompt user to select input PDF file
root = Tk()
root.withdraw()
pdf_file_path = askopenfilename(filetypes=[("PDF files", "*.pdf")])
# Open the PDF file
pdf_file = fitz.open(pdf_file_path)
# Extract text from PDF file
pdf_text = ''
for page_num in range(pdf_file.page_count):
page = pdf_file[page_num]
pdf_text += str(page.get_text("text"))
# Split text into lines and columns
lines = pdf_text.split('\n')
columns = [line.split('\t') for line in lines]
# Save data to CSV file
csv_file_path = pdf_file_path.replace('.pdf', '.csv')
with open(csv_file_path, 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
for row in columns:
writer.writerow(row)
print(f"CSV file saved at {csv_file_path}")
What I have observed so far is that the pdf_file.page_count variable is always 1 and if I
print(pdf_text)
it will give me the exact same message as when trying to open the pdf without Adobe Acrobat Reader.
When the CSV file is saved it will output this CSV file error message
I wonder if is stricly because of how the PDF file was created.