Python Requests: requests.exceptions.TooManyRedirects: Exceeded 30 redirects

Question

I was trying to crawl this page using python-requests library

import requests
from lxml import etree,html

url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031'
r = requests.get(url)
tree = etree.HTML(r.text)
print tree

but I got above error. (TooManyRedirects) I tried to use allow_redirects parameter but same error

r = requests.get(url, allow_redirects=True)

I even tried to send headers and data alongwith url but I'm not sure if this is correct way to do it.

headers = {'content-type': 'text/html'}
payload = {'ie':'UTF8','node':'976419031'}
r = requests.post(url,data=payload,headers=headers,allow_redirects=True)

how to resolve this error. I've even tried beautiful-soup4 out of curiosity and I got different but same kind of error

page = BeautifulSoup(urllib2.urlopen(url))

urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Moved Permanently

`allow_redirects=True` is the default; the problem isn't that you don't follow redirects, the problem is that the server *keeps redirecting you*. Probably because you don't accept cookies.. — Martijn Pieters, May 14 '14 at 10:26
A session doesn't appear to help. The URL you are accessing redirects to `http://www.amazon.in/b?ie=UTF8&node=976419031`, which redirects to `http://www.amazon.in/electronics/b?ie=UTF8&node=976419031`. The latter redirects to itself. — Martijn Pieters, May 14 '14 at 10:31

Martijn Pieters · Accepted Answer · 2017-12-18T11:27:45.190

Amazon is redirecting your request to http://www.amazon.in/b?ie=UTF8&node=976419031, which in turn redirects to http://www.amazon.in/electronics/b?ie=UTF8&node=976419031, after which you have entered a loop:

>>> loc = url
>>> seen = set()
>>> while True:
...     r = requests.get(loc, allow_redirects=False)
...     loc = r.headers['location']
...     if loc in seen: break
...     seen.add(loc)
...     print loc
... 
http://www.amazon.in/b?ie=UTF8&node=976419031
http://www.amazon.in/electronics/b?ie=UTF8&node=976419031
>>> loc
http://www.amazon.in/b?ie=UTF8&node=976419031

So your original URL A redirects no a new URL B, which redirects to C, which redirects to B, etc.

Apparently Amazon does this based on the User-Agent header, at which point it sets a cookie that following requests should send back. The following works:

>>> s = requests.Session()
>>> s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'
>>> r = s.get(url)
>>> r
<Response [200]>

This created a session (for ease of re-use and for cookie persistence), and a copy of the Chrome user agent string. The request succeeds (returns a 200 response).

This works for me on Windows, but not on a Ubuntu VM.. any ideas why? — ProgSnob, Mar 31 '18 at 06:30
@ProgSnob no, especially not when you don’t tell me how it doesn’t work, sorry. — Martijn Pieters, Mar 31 '18 at 07:49

score 5 · Answer 2 · edited Jul 07 '17 at 11:33

5

Increase of max_redirect is possible by explicitly specifying the count as in example below:

session = requests.Session()
session.max_redirects = 60
session.get('http://www.amazon.com')

edited Jul 07 '17 at 11:33

J-Alex

6,881
10
46
64

answered Jul 07 '17 at 10:06

PrabaKaran D

119
1
7

score 0 · Answer 3 · answered Sep 05 '17 at 10:26

0

You need to copy the cookie value to you header. It works on my end.

answered Sep 05 '17 at 10:26

Rocky Chen

448
3
9

How do I do that? – ProgSnob Mar 30 '18 at 12:28

Python Requests: requests.exceptions.TooManyRedirects: Exceeded 30 redirects

3 Answers3

Linked