I have got web-page data, but now I want to get it with proxy. How could I do it?
import urllib
def get_main_html():
request = urllib.request.Request(URL, headers=headers)
doc = lh.parse(urllib.request.urlopen(request))
return doc
I have got web-page data, but now I want to get it with proxy. How could I do it?
import urllib
def get_main_html():
request = urllib.request.Request(URL, headers=headers)
doc = lh.parse(urllib.request.urlopen(request))
return doc
From the documentation
urllib will auto-detect your proxy settings and use those. This is through the ProxyHandler, which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful. One way to do this is to setup our own ProxyHandler, with no proxies defined. This is done using similar steps to setting up a Basic Authentication handle.
Check this, https://docs.python.org/3/howto/urllib2.html#proxies
use :
proxies = {'http': 'http://myproxy.example.com:1234'}
print "Using HTTP proxy %s" % proxies['http']
urllib.urlopen("http://yoursite", proxies=proxies)
You can use socksipy
import ftplib
import telnetlib
import urllib2
import socks
#Set the proxy information
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, 'localhost', 9050)
#Route an FTP session through the SOCKS proxy
socks.wrapmodule(ftplib)
ftp = ftplib.FTP('cdimage.ubuntu.com')
ftp.login('anonymous', 'support@aol.com')
print ftp.dir('cdimage') ftp.close()
#Route a telnet connection through the SOCKS proxy
socks.wrapmodule(telnetlib)
tn = telnetlib.Telnet('achaea.com')
print tn.read_very_eager() tn.close()
#Route an HTTP request through the SOCKS proxy
socks.wrapmodule(urllib2)
print urllib2.urlopen('http://www.whatismyip.com/automation/n09230945.asp').read()
in your case:
import urllib
import socks
#Set the proxy information
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, 'localhost', 9050)
socks.wrapmodule(urllib)
def get_main_html():
request = urllib.request.Request(URL, headers=headers)
doc = lh.parse(urllib.request.urlopen(request))
return doc