5

I am trying to get the following URL with requests.get() in Python 3.x: http://www.finanzen.net/suchergebnis.asp?strSuchString=DE0005933931 (this URL consists of a base URL with the search string DE0005933931).

The request gets redirected (via HTTP status code 301) to http://www.finanzen.net/etf/ishares_core_dax%AE_ucits_etf_de in a browser (containing the character 0xAE character ® in the URL). Using requests.get() with the redirected URL works as well.

When trying to get the search string URL with Python 2.7 everything works and I get the redirected response, using Python 3.x I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 21: invalid start byte

The code snippet to test this:

import requests

url_1 = 'http://www.finanzen.net/suchergebnis.asp?strSuchString=LU0274208692'
# redirected to http://www.finanzen.net/etf/db_x-trackers_msci_world_index_ucits_etf_1c
url_2 = 'http://www.finanzen.net/suchergebnis.asp?strSuchString=DE0005933931'
# redirected to http://www.finanzen.net/etf/ishares_core_dax%AE_ucits_etf_de

print(requests.get(url_1).status_code)  # working
print(requests.get(url_2).status_code)  # error with Python 3.x

Some more information:

  • I am working on Windows 7 using Python 3.6.3 with requests.__version__ = '2.18.4' but I get the same error with other Python versions as well (3.4, 3.5).
  • Using other search strings, everything works with Python 3.x as well, e.g. http://www.finanzen.net/suchergebnis.asp?strSuchString=LU0274208692
  • Interestingly I even get an Internal Server Error with https://www.hurl.it trying to GET the above mentioned URL. Maybe it is no Python problem.

Any idea, why this is working in Python 2.7 but not in Python 3.x and what I can do about this?

bastelflp
  • 9,362
  • 7
  • 32
  • 67

1 Answers1

5

The server responds with a URL encoded as Latin-1 which is not URL encoded; non-ASCII bytes are shown as 0x?? hex escapes:

Location: /etf/ishares_core_dax0xAE_ucits_etf_de

The 0xAE byte there is not a valid URL character; the server is violating standards here. What they should be sending is

Location: /etf/ishares_core_dax%AE_ucits_etf_de

or

Location: /etf/ishares_core_dax%C2%AE_ucits_etf_de

Using escaped data for the Latin-1 or UTF-8 encoding of the URL.

We can patch requests to be more robust in the face of this error, by returning the Location header unchanged:

from requests.sessions import SessionRedirectMixin

def get_redirect_target(
        self, resp, _orig=SessionRedirectMixin.get_redirect_target):
    try:
        return _orig(self, resp)
    except UnicodeDecodeError:
        return resp.headers['location']

SessionRedirectMixin.get_redirect_target = get_redirect_target

With this patch applied the redirects work as expected.

I created a pull request to improve Location handling.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343