This is the API provided by luminati.io a premium proxy provider. However, it returns as a byte code instead of a dictionary, so it is converted into a dictonary as to be able to extract the ip
and port
:
Every request will end up with a new peer proxy because the IPs rotate for every request.
import csv
import requests
import json
import time
#!/usr/bin/env python
print('If you get error "ImportError: No module named \'six\'"'+\
'install six:\n$ sudo pip install six');
import sys
if sys.version_info[0]==2:
import six
from six.moves.urllib import request
opener = request.build_opener(
request.ProxyHandler(
{'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
proxy_details = opener.open('http://lumtest.com/myip.json').read()
if sys.version_info[0]==3:
import urllib.request
opener = urllib.request.build_opener(
urllib.request.ProxyHandler(
{'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
proxy_details = opener.open('http://lumtest.com/myip.json').read()
proxy_dictionary = json.loads(proxy_details)
print(proxy_dictionary)
Then I plan to use the ip
and port
in the requests module to connect to the website of interest:
headers = {'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}
if __name__ == "__main__":
search_keyword = input("Enter the search keyword: ")
page_number = int(input("Enter total number of pages: "))
for i in range(1,page_number+1):
time.sleep(10)
link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
print(proxy)
req = requests.get(link,headers=headers,proxies={"https":proxy})
But my issue is that it errors at the requests
portion. When I change proxies={"https":proxy}
to proxies={"http":proxy}
there was one time it went through, but other than that, the proxy fails to connect.
Sample output:
print_dictionary = {'ip': '84.22.151.191', 'country': 'RU', 'asn': {'asnum': 57129, 'org_name': 'Optibit LLC'}, 'geo': {'city': 'Krasnoyarsk', 'region': 'KYA', 'postal_code': '660000', 'latitude': 56.0097, 'longitude': 92.7917, 'tz': 'Asia/Krasnoyarsk'}}
The details of the peer proxy is shown in the image below:
print(proxy)
will yield 84.22.151.191:57129
which is fed into the requests.get
method
The Error I get:
(Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000282DDD592B0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))
I tested removing the proxies={"https":proxy}
argument to the requests
method and the scraping works with no error. So the proxy has a issue or the way I access it.