0

I'm creating a script that reads information from search query on zhaopin.com using urllib2

When I try to open the url by copying it to my web browser (Chrome), I have no problem opening the site: http://sou.zhaopin.com/jobs/searchresult.ashx?p=1&isadv=0&bj=160000&in=160200

When I open the url using urllib2, I get the error moessage HTTPError: HTTP Error 502: Bad Gateway. From searching google, I could not figure out what I'm doing wrong.

import urllib
data = {}
data['in']='160200'
data['bj']='160000'
data['isadv']='0'
data['p']=1

url = 'http://sou.zhaopin.com/jobs/searchresult.ashx?'
url_values = urllib.urlencode(data)
full_url= url + url_values
print full_url
response = urllib2.urlopen(url)
html = response.read()
response.close()

Perhaps it is a problem with the URL: after opening the url in Chrome, the 'http://' disappears. I'd appreciate it if you could help me figuring this out.

SGolt
  • 51
  • 1
  • 5
  • Are you behind the Great Firewall of China? Try capturing the HTTP session using Wireshark and look at the raw data. The difference in the requests should be visible there. – John Zwinck Aug 06 '17 at 07:34
  • That disappeared of http in address bar is nothing just a chrome feature nothing else. – Rajan Chauhan Aug 06 '17 at 07:42

2 Answers2

1

Try urllib instead of urllib2:

response = urllib.urlopen(url)
html = response.read()
response.close()
gre_gor
  • 6,669
  • 9
  • 47
  • 52
Surinder Batti
  • 109
  • 1
  • 4
0
HTTP Error 502: Bad Gateway

The above error occurs when there is a misconfiguration in the server you are trying. The misconfiguration can be due to the server is rebooting or not available at that moment.

This error can also be a result of poor IP communication between back-end computers, possibly including the server at the site you are trying to visit. It may be that the server is overloaded.

You can use urllib itself in your code to open the URL.

import urllib
data = {}
data['in']='160200'
data['bj']='160000'
data['isadv']='0'
data['p']=1

url = 'http://sou.zhaopin.com/jobs/searchresult.ashx?'
url_values = urllib.urlencode(data)
full_url= url + url_values
print full_url
response = urllib.urlopen(url)
html = response.read()
response.close()
Pravitha V
  • 3,308
  • 4
  • 33
  • 51