-1

I am writing a web scraper that I am trying to proxy, but can't quite figure out how to do it in Elixir.

I am using Hound running on top of a headless ChromeDriver. I purchased some proxy IPs through https://luminati.io and they offer both a chrome extension and a user/password base proxy server.

The webscraper actions comprise of a GenServer that represent a user scraping the web. There is no front end of the app, it accepts commands that are sent to it through a bot I built on Telegram, so when a user sends the login command for instance it triggers the login function of the GS.

At that point the GenServer will change the ChromeDriver session using Hound.change_session_to/2 and then log the user in.

This works great, but now I want to send every request through the proxy server via username and password. When changing the session with Hound, it allows the chromeOptions to be set as well.

ua = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36"
change_session_to(String.to_atom(account.username), %{browserName: "chrome", chromeOptions: %{"args" => ["--user-agent=#{ua}", "--proxy-server=http://user:password@proxy.luminati.io:22225"]}})
navigate_to "https://www.website.com/"

Another thing that I have tried doing is loading luminati's ChromeExtension that I would be able to use to proxy the traffic through, but I can't get the extension to load for each session. I downloaded the packed CRM chrome extension and placed it within my priv folder. When the session loads it seems to load the User Agent just fine, but the extension never starts. When I am trying to load the extension I am not running headless.

ua = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36"
priv_dir = :code.priv_dir(:boost_buddy)
change_session_to(String.to_atom(account.username), %{browserName: "chrome", 
chromeOptions: %{"extensions" => ['#{priv_dir}/luminati/3.2_1'], "args" => ["-
-user-agent=#{ua}", "--proxy-server=http://user:password@proxy.luminati.io:22225"]}})
navigate_to "https://www.website.com/"

Does anyone have experience using chrome driver with Elixir? With Ruby and Java setting up the extension is typically no problem.

Ronan Boiteau
  • 9,608
  • 6
  • 34
  • 56
Joe Marion
  • 396
  • 2
  • 14

1 Answers1

-2

https://github.com/GoogleChrome/puppeteer/issues/659

-1 because this was the top result for googling "chrome headless extension"

Regarding sending each request through the proxy, I think you either need to interface with the chrome driver yourself (hijacking hound) or skip hound and use either chrome directly or through a selenium grid.

I think the issue stems from the fact that hound will initiate one single chrome instance, where the proxy settings will be defined. Further requests are done using that proxy.

So in order to achieve multiple proxy connections for different sessions you either need a way to set them through navigational steps (visiting a proxy website that then serves as a hard proxy) or use different browser instances altogether (I might be wrong though and perhaps there's an easier way of proxying the requests)

m3characters
  • 2,240
  • 2
  • 14
  • 18
  • -1 for thinking I was asking about combing the proxy and wanting to load an extension at the same time. I said that I want to establish a session with a proxy using the chromeOptions `--proxy-server` arg and that it wasn't working. As a last resort I tried loading an extension that would set up a proxy in a regular Chrome (not headless) session and was still unsuccessful. I've done this before in Ruby and was trying to do it in Elixir. – Joe Marion Jan 23 '18 at 01:18
  • Actually you don't mention that you tried the extension in non headless mode, and you end up saying that in ruby and java it just works - when in fact in headless you can't load extensions at all, so you deserve another -1 for not making the distinction – m3characters Jan 23 '18 at 10:18