python requests return a different web page from browser or urllib

Question

I use requests to scrape webpage for some content.
When I use

import requests  
requests.get('example.org')

I get a different page from the one I get when I use my broswer or using

import urllib.request
urllib.request.urlopen('example.org')

I tried using urllib but it was really slow.
In a comparison test I did it was 50% slower than requests !!

How Do you solve this??

score 4 · Answer 1 · answered Apr 08 '17 at 23:39

4

After a lot of investigations I found that the site passes a cookie in the header attached to the first visitor to the site only.

so the solution is to get the cookies with head request, then resend them with your get request

import requests  
# get the cookies with head(), this doesn't get the body so it's FAST
cookies = requests.head('example.com')
# send get request with the cookies
result = requests.get('example.com', cookies=cookies)

Now It's faster than urllib + the same result :)

answered Apr 08 '17 at 23:39

Mohamed El-Saka

732
10
13

Or you can use a `Session` instance. It will automatically manage cookies for with `CookieJar`. – Dashadower Apr 09 '17 at 03:09
I tried that but in my case the cookie was sent with the first request only & I didn't want to reuse the same cookie in subsequent requests, so I simply passed the cookies to the get request. Still your suggestion valid for most of other cases – Mohamed El-Saka Apr 10 '17 at 04:17

python requests return a different web page from browser or urllib

1 Answers1

Linked