0

I am using the following ruby code to scrape LinkedIn public profile using following ruby gems

1) Using 'mechanize' gem


    require 'rubygems'
    require 'mechanize'
    require 'nokogiri'
    require 'open-uri'
    agent = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari 4'}
    agent.follow_meta_refresh = true
    page = agent.get("https://www.linkedin.com/login")
    login_form = page.form(:class => 'login__form')
    login_form.session_key = "my_email_id"
    login_form.session_password = "my_password"
    page = agent.submit(login_form, login_form.buttons.first)

2) Using 'watir' gem


    require 'nokogiri'
    require 'open-uri'
    require 'webdrivers'
    require 'watir'
    browser = Watir::Browser.new :chrome, headless: true
    browser.goto 'https://www.linkedin.com/login'
    browser.input(name: 'session_key').send_keys('my_email_id', :return)
    browser.input(name: 'session_password').send_keys('my_password', :return)
    browser.html

When I tried to use this on a local machine(ubuntu), LinkedIn does not send a security code as I had already used the local chrome browser to log in to this account a previous time. so LinkedIn knows its known browser and it sends the proper response and able to scrape the details.

but when I tried these codes on production(Linux ec2 instance), LinkedIn sent the security code to my email as it does not log in to my LinkedIn account as it does not know the browser(installed google chrome & chrome driver on Linux ec2) and does not give right response ad not able to scrape it

Any approach to resolve this issue or bypass security checks as I am using the right linkedin credentials?

vivek
  • 35
  • 7
  • Did you consider using their API instead of trying to scrape their site? – spickermann Jun 15 '21 at 06:54
  • I have seen a lot of articles about LinkedIn API's and it clearly mentioned that we can not get all useful information like candidates' current and past employment details, educational background, and much other information using API. so I started with scraping first. Even on my local system, I have achieved this but in production, login required a security code sent by Linkedin in my mail. – vivek Jun 15 '21 at 07:43
  • This does not answer your question in any way but LinkedIn will absolutely detect what you're doing and will absolutely block you. Scraping data off LinkedIn without using their APIs is difficult and they have extremely sophisticated technology to detect and block such access. (e.g., they look at traffic from AWS as being immediately suspicious) If you're doing this in some way that can be linked to your personal LinkedIn profile then expect to have your account permanently terminated for violating their AUP. – anothermh Jun 16 '21 at 19:07

0 Answers0