0

so I'm currently writing up some code to scrape a bunch of pdfs for information, however I don't want all the pages to be returned since some aren't useful. I've already solved that but I keep getting a message saying "The output file is empty". I know what pages cause this and I'm not using them later in the code but I don't want the message to keep printing when I run it for all the pdfs. I'm using tabula-py and PyPDF2 to get the info I need

def test_pages(path, area, value):
        valid_pages = []

        file = open(path, 'rb')
        pdfReader = pp2.PdfReader(file)
        total_pages = len(pdfReader.pages)
        
        for page in range(1, total_pages + 1):
            try:
                table = tb.read_pdf(file, pages=page, area=test_area, pandas_options={'header': None}, output_format="dataframe")
                if table[0].isin(["GTIN"]).any().any():
                    valid_pages.append(page)
            except:
                pass
        
        return valid_pages

And this is the code, the line that produces it is the table line in the try statement. Is there a way to stop it from printing the message? Thanks

I've already tried using silent=True in the read_pdf() function but that doesn't stop it

jdah97
  • 1

0 Answers0