I am figuring out how to loop to various multiple-page PDF-files and scrape their tables nicely into Excel-files. However, camelot
and tabula
are unable to process the PDF-files:
# pip install --upgrade camelot-py[cv] tabula-py excalibur-py
import tabula as tb
import camelot
import pandas as pd
import os
BASE_PATH = os.path.dirname((os.path.abspath(r"...")))
FOLDER_PATH = os.path.join(BASE_PATH, r"...")
pdfs = [os.path.abspath(x) for x in os.listdir(r"...") if x.endswith(".pdf")]
#
listoflengths = []
def len_table(filepath):
tables = camelot.read_pdf(filepath, flavor='stream', columns=['300'], split_text=True)
tablelength = len(tables)
listoflengths.append(tablelength)
#
pdfs[0]
len_table(pdfs[1])
# print(listoflengths)
Is there any solution to this? I need to work around the manual process of loading tables from PDF-files into Excel.