pdf linkI have been trying to use the Camelot library and trying to capture a table (that isn't really formatted as a table) by setting the flavor parameter to 'stream'. However, it is not detecting the entire table. So what I decided to do is try to detect the entire page by feeding it an area parameter that takes the pages dimensions as inputs.
I have tried using this code but it still does not give me the whole page dimensions.
import camelot
from matplotlib import pyplot as plt
import pandas as pd
import PyPDF2
pdf_file = open(r'C:\Users\PC\PycharmProjects\finstate.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(10)
width = page.mediaBox.getWidth()
height = page.mediaBox.getHeight()
print("Width:", width)
print("Height:", height)
page_area = [0, 0, 0, 0]
pdf = camelot.read_pdf(r'C:\Users\PC\PycharmProjects\finstate.pdf', pages='0-10', flavor='stream', area=page_area)
first_table = pdf[10]
print(first_table.df)
first_table.to_csv(r'C:\Users\PC\Desktop\table.csv')