so I'm currently writing up some code to scrape a bunch of pdfs for information, however I don't want all the pages to be returned since some aren't useful. I've already solved that but I keep getting a message saying "The output file is empty". I know what pages cause this and I'm not using them later in the code but I don't want the message to keep printing when I run it for all the pdfs. I'm using tabula-py and PyPDF2 to get the info I need
def test_pages(path, area, value):
valid_pages = []
file = open(path, 'rb')
pdfReader = pp2.PdfReader(file)
total_pages = len(pdfReader.pages)
for page in range(1, total_pages + 1):
try:
table = tb.read_pdf(file, pages=page, area=test_area, pandas_options={'header': None}, output_format="dataframe")
if table[0].isin(["GTIN"]).any().any():
valid_pages.append(page)
except:
pass
return valid_pages
And this is the code, the line that produces it is the table
line in the try statement. Is there a way to stop it from printing the message?
Thanks
I've already tried using silent=True
in the read_pdf()
function but that doesn't stop it