Python get raises HTTPError 400 Client Error, but after manually accessing URL, get works temporarily

Question

When I run this code in iPython (Python 2.7):

from requests import get
_get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'})
print _get.url
_get.raise_for_status()
_get.json()

I am getting:

http://stats.nba.com/stats/playergamelog?PlayerID=203083&Season=2015-16&SeasonType=Regular+Season
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-5-8f8343b2c4cd> in <module>()
      1 _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'})
      2 print _get.url
----> 3 _get.raise_for_status()
      4 _get.json()

/Library/Python/2.7/site-packages/requests/models.pyc in raise_for_status(self)
    849 
    850         if http_error_msg:
--> 851             raise HTTPError(http_error_msg, response=self)
    852 
    853     def close(self):

HTTPError: 400 Client Error: Bad Request

However, if I go to the url in my browser, it works. Then, when I come back to the code and run it again after manually visiting the URL in my browser (Chrome which iPython is running in), the code runs with no error. However, it may go back to raising the error in sequential executions.

This code has worked for me hundreds if not thousands of times with no issue. How do I fix this error?

Thanks.

realli · Answer 1 · 2016-01-26T14:19:36.123

HTTPError: 400 Client Error: Bad Request means the request you made has error. And I think the server may check some headers in the HTTP request, for example the user-agent.

So I tried setting the User-Agent header to mimic Firefox:

# No User-Agent
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'})
>>> _get.raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\requests\models.py", line 840, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://stats.nba.com/stats/playergamelog?PlayerID=203082&Season=2015-16&SeasonType=Regular+Season

# This time, set user-agent to mimic a desktop browser
>>> headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'}
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'}, headers=headers)
>>> _get.raise_for_status()
>>>
# no error

The reason it can work after you visiting the URL in browser is caching.

According to Alastair McCormack, stats.nba.com is fronted by Akamai CDN, so the caching is probably happening at the edge, "varied" by the query string/URI rather than extranous headers. Once a valid response has been made for that URI, it is cached by the CDN edge node serving that client.

So when you run code after visited url in browser, CDN will return you the cached response. no 400 will be raised in such situation.

`stats.nba.com` is fronted by Akamai CDN, so the caching is probably happening at the edge, "varied" by the query string/URI rather than extranous headers. Once a valid response has been made for that URI, it is cached by the CDN edge node serving that client. It looks like the server requires a desktop-type User Agent to be set to make a valid response and therefore a cacheable item. Good spot! — Alastair McCormack, Jan 26 '16 at 09:49
I think I understand the logic you laid out, but in case I don't, from a purely practical standpoint does this mean I shouldn't encounter this caching issue ever again if I specify a valid user agent? — andingo, Jan 26 '16 at 13:38
@andingo yes everything should be ok if user-agent is valid. missing user agent is your problem，server respond 400 for that. Caching just make your code run without error after you visit the url in browser. Since CDN will return you the result — realli, Jan 26 '16 at 14:13

Python get raises HTTPError 400 Client Error, but after manually accessing URL, get works temporarily

1 Answers1