HTTP Error 503 when reading xlsx file from url

Question

I'm trying to import the following excel file in pandas: https://rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/tables/b2/hb2-daily-close.xlsx

I tried the following:

url="https://www.rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/tables/b2/hb2-daily.xlsx"
df = pd.read_excel(url,sheet_name="Data", header=4, usecols="A,H")

but I get the following HTTPError: HTTP Error 503: Service Temporarily Unavailable

I thought the problem lay in missing request headers so I tried the following, but I keep getting the same error..

url="https://www.rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/tables/b2/hb2-daily-close.xlsx"
req = Request(url) 
req.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0') 
content = urlopen(req) 
df = pd.read_excel(content,sheet_name="Data", header=4, usecols="A,H")

Any thoughts? Thanks

PS It looks like the website is protected by Cloudfare. How to get around Newspaper throwing 503 exceptions for certain webpages Probably selenium is the only solution here

score 0 · Answer 1 · answered Oct 08 '21 at 21:26

0

It appears that when you added the headers, you removed the www from the start of the URL. That address (without the www) gives you a HTTP 301 Redirect, which with urllib is not automatically followed.

Try adding www to your URL when fetching with the headers and see if it works (or switch to using requests)

answered Oct 08 '21 at 21:26

Da Chucky

781
3
13

Adding the www didn't work. What would the code with requests be? I'm afraid that selenium is the only solution (see the question update) – younggotti Oct 08 '21 at 21:32
1

Based on the note you added, I think you would be right in that selenium or another tool to operate a browser would be a better option, but even that might not work. Your best bet might be to find an alternative source, or possibly contacting the host and seeing if they have some other way to get the data automatically. – Da Chucky Oct 08 '21 at 21:54
Selenium worked flawlessly – younggotti Oct 08 '21 at 22:06

HTTP Error 503 when reading xlsx file from url

1 Answers1