(I know that pdfplumber is mainly geared towards computer-generated PDFs. However, before I spend a couple of days handtyping data from my scanned PDFs, I thought I'd ask if pdfplumber could somehow help me.)
My problem:
I have scanned PDFs from historical books.
Example: Data from statistical yearbook
Now I'm trying to extract the table (the one in the lower-right in the example) from the scanned PDF.
My first attempts at extracting the table with pdfplumber didn't work.
e.g.
with pdfplumber.open('test.pdf') as pdf:
page = pdf.pages[0]
tables = page.extract_tables()
print(tables)
returned None
Is there any hope that I will be able to extract this kind of data non-manually? Or should I just suck it up?
Thanks in advance for any help or advice!