I'm trying to scrape product information from Adidas but for some reason when making a get request using PhantomJS the session is hanging and not receiving a response. I've tried the same using Selenium with Chrome and headless with Chrome and it works. So my assumption would be that it's some kind of firewall or browser fingerprint tracking that's blocking PhantomJS. I need to use authenticated proxies so selenium with Chrome isn't an option.
Is anyone able to help me get this working using PhantomJS? Here is my original script:
from selenium import webdriver
import base64
username = 'proxyusername'
password = 'proxypassword'
proxy = 'host:port'
service_args = ['--proxy=http://'+str(proxy),
'--proxy-type=http'
]
login =str(username)+':'+str(password)
authentication_token = "Basic " + base64.b64encode(login.encode())
capa = webdriver.DesiredCapabilities.PHANTOMJS
capa['phantomjs.page.customHeaders.Proxy-Authorization'] = authentication_token
capa['phantomjs.page.settings.userAgent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
driver = webdriver.PhantomJS("/usr/local/bin/phantomjs",desired_capabilities=capa, service_args=service_args)
print 'Getting URL'
driver.get('http://www.adidas.co.uk')
print 'Request made'
html_source = driver.page_source
print html_source