Skip errors and continue loop when url provides no file

Question

I am using Tabula-py to download and extract tables from PDFs via a list of URLs. The URLs are created based on rules and everything is working fine except when Tabula tries to process a PDF from a link with no page/file (specifically weekends as PDFs aren't published on weekends).

Full Python script below.

I want the script to skip any errors it comes into (specifically when attempt to pull from a weekend based URL) and continue processing.

Any ideas?

import datetime
import pickle

import pandas
import tabula

# create text file

df=open('urls.txt','w')



# Example list

start = datetime.datetime(2022, 11, 1)
end = datetime.datetime(2022, 11, 11)
delta = datetime.timedelta(days=1)

pdf_path='https://www.irishprisons.ie/wp-content/uploads/documents_pdf/{date1:%d-%B-%Y}.pdf'

while start < end:
    date1 = start
    date2 = start + delta
    url = pdf_path.format(date1=date1, date2=date2)


# Save list and stop loop
    df.write(url)
    start = date2  

# Extract Table from PDF availible from url

    path = url
    # Make the most recent
    #path = "https://www.irishprisons.ie/wp-content/uploads/documents_pdf/11-November-2022.pdf"

    dfs = tabula.read_pdf(path, pages='1', lattice=True, stream=True, pandas_options={'header':None})


    try:
        new_header = dfs[0].iloc[1]
        inmate_count = dfs[0].drop(labels=0, axis=0)
        inmate_count.columns = [new_header]
        inmate_count=inmate_count.dropna(how='all').reset_index(drop=True)
        inmate_count = inmate_count.drop(labels=[0], axis=0)
        inmate_count['url'] = path
        inmate_count.to_csv("first_table.csv", mode='a', header=False, index=False)
        print(inmate_count)
    except  Exception:
        pass

print("Finished")

I've tried but am unfamiliar with try/exception, but that doesn't seem to do anything.

If this code does not do what you want, then show us the output, and explain how that is different from what you wanted. — John Gordon, Nov 13 '22 at 14:58
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Nov 13 '22 at 15:00
If you want to skip errors that are related to fetching the url, then it seems like the call to `read_pdf()` belongs inside the try/except block... — John Gordon, Nov 13 '22 at 15:01

score 0 · Answer 1 · answered Nov 13 '22 at 15:03

You can write separate try/catches for each independent functions so the others will continue:

try:
  foo = func1()
  foo.func2()
except Exception:
  print("this failed")

try:
  mom = func3()
except Exception:
  print("this failed")

try:
  func4()
except Exception:
  print("this failed")

Skip errors and continue loop when url provides no file

1 Answers1