Why does Python 3 urllib redirect to Yahoo?

Question

I am using urlopen in urllib.request in Python 3.5.1 (64-bit version on Windows 10) to load content from www.wordreference.com for a French project. Somehow, whenever I request anything outside the domain itself, page content is instead loaded from yahoo.com.

Here, I print the first 350 characters from http://www.wordreference.com:

>>> from urllib import request
>>> page = request.urlopen("http://www.wordreference.com")
>>> content = page.read()
>>> print(content.decode()[:350])
<!DOCTYPE html>

<html lang="en">

<head>

<title>English to French, Italian, German &amp; Spanish Dictionary -
WordReference.com</title>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<meta name="description" content="Free online dictionaries - Spanish, French,
Italian, German and more. Conjugations, audio pronunciations and

Next, I requested a specific document on the domain:

>>> page = request.urlopen("http://www.wordreference.com/enfr/test")
>>> content = page.read()
>>> print(content.decode()[:350])
<!DOCTYPE html>
<html id="atomic" lang="en-US" class="atomic my3columns  l-out Pos-r https fp
fp-v2 rc1 fp-default mini-uh-on viewer-right ltr desktop Desktop bkt201">
<head>

<title>Yahoo</title><meta http-equiv="x-dns-prefetch-control" content="on"
<link rel="dns-prefetch" href="//s.yimg.com"><link rel="preconnect"
href="//s.yimg.com"><li

The last request takes about six seconds longer to read (which could be my slow internet) and the content comes straight from http://www.yahoo.com/. I can access the above URLs fine in a web browser.

Why is this happening? Is this something related to Windows 10? I have tried this on other domains and the problem does not occur.

wordreference detects you're not using a browser and "blocks" your request by redirecting it. Consider setting a browser user agent in urllib. — Quentin Pradet, Jul 05 '16 at 06:37
Just tried with postman, and the same code you posted and I did not experience the same. Also I suggest you use [`urllib3`](https://urllib3.readthedocs.io/en/latest/) or [`requests`](http://docs.python-requests.org/en/master/) modules — smac89, Jul 05 '16 at 06:37
My guess is that the request that the `urlib` library sends is misinterpreted by the server which then redirects it to Yahoo. The same redirection also happens when using `curl` to send the request: `Object moved
Object moved to here.
` When using another library, such as `requests`, it works as expected. — DeepSpace, Jul 05 '16 at 06:41
@QuentinPradet That seems to be the solution. After setting a user agent using `urllib.request.Request` the regular page content loads. — cp289, Jul 05 '16 at 23:38

score -2 · Answer 1 · answered Jul 05 '16 at 06:34

-2

I tried the following code and it's working.

import requests
page = requests.get("http://www.wordreference.com/enfr/test")
content = page.text
print(content.encode('utf-8')[:350])

answered Jul 05 '16 at 06:34

Sijan Bhandari

2,941
3
23
36

Why does Python 3 urllib redirect to Yahoo?

Object moved to here.

1 Answers1