Hello!
Question to one who use scrapinghub, shub-image, selenuim+phantomjs, crawlera. English skill is not good, sorry
I needed to scrape site which have many JS code. So I use scrapy+selenium. Aslo it should run at Scrapy Cloud. I've writtеn spider which uses scrapy+selenuim+phantomjs and run it on my local machine. All is ok. Then I deployed project to Scrapy cloud using shub-image. Deployment is ok. But results of webdriver.page_source is different. It's ok on local, not ok(HTML with inscription - 403, request 200 http) at cloud. Then I decided to use crawlera acc. I've added it with:
service_args = [
'--proxy="proxy.crawlera.com:8010"',
'--proxy-type=https',
'--proxy-auth="apikey"',
]
for Windows(local)
self.driver = webdriver.PhantomJS(executable_path=r'D:\programms\phantomjs-2.1.1-windows\bin\phantomjs.exe',service_args=service_args)
for docker instance
self.driver = webdriver.PhantomJS(executable_path=r'/usr/bin/phantomjs', service_args=service_args, desired_capabilities=dcap)
Again at local all is ok. Cloud not ok. I've checked cralwera info. It's ok. Requests sends from both(local and cloud).
Note again: Same proxies(crawlera). response at windows: 200 http, html with right code
response at ScrapyCloud(docker instance): 200 http, html with inscription 403(forbidden)
I dont get what's wrong. I think it might be differences between phantomjs versions(Windows, Linux).
Any ideas?