I'm writing a script to DL the entire collection of BBC podcasts from various show hosts. My script uses BS4, Mechanize, and wget.
I would like to know how I can test if a request for a URL yields a response code of '404' form the server. I have wrote the below function:
def getResponseCode(br, url):
print("Opening: " + url)
try:
response = br.open(url)
print("Response code: " + str(response.code))
return True
except (mechanize.HTTPError, mechanize.URLError) as e:
if isinstance(e,mechanize.HTTPError):
print("Mechanize error: " + str(e.code))
else:
print("Mechanize error: " + str(e.reason.args))
return False
I pass into it my Browser()
object and a URL string. It returns either True
or False
depending on whether the response is a '404' or '200' (well actually, Mechanize throws and Exception if it is anything other than a '200' hence the Exception handling).
In main()
I am basically looping over this function passing in a number of URLs from a list of URLs that I have scraped with BS4. When the function returns True
I proceed to download the MP3 with wget
.
However. My problem is:
- The URLs are direct path to the podcast MP3 files on the remote
server and I have noticed that when the URL is available,
br.open(<URL>)
will hang. I suspect this is because Mechanize is caching/downloading the actual data from the server. I do not want this because I merely want to return True if the response code is '200'. How can I not cache/DL and just test the response code?
I have tried using br.open_novisit(url, data=None)
however the hang still persists...