Why does Mechanize gets different results than a human search?

Question

I'm using the following code:

require 'rubygems'
require 'mechanize'
require 'nokogiri'  
require 'open-uri'  
require 'logger'
require 'slowweb'
SlowWeb.limit('linkedin.com', 1, 10)

#create agent
agent = Mechanize.new { |agent| 
  agent.user_agent_alias = 'Mac Firefox'
  agent.log = Logger.new "mech.log" 
}
agent.follow_meta_refresh = true
page = agent.get("https://ca.linkedin.com/")

#login
login_form = page.forms.first
login_form.session_key = "username"
login_form.session_password = "pass"

page = agent.submit(login_form, login_form.buttons.first)
url = agent.get("https://www.linkedin.com/vsearch/f?type=all&keywords=Recruiter+Boston")
results = agent.get(url).body.scan(/\{"person"\:\{.*?\}\}/)
results.each do |person|
  json = JSON.parse(person)
  puts json['person']['firstName'] 
  puts json['person']['lastName']
end

This lists people who are my current connections, so I'm logged in, but when doing the search manually, it lists Boston Recruiters.

I suspect my crawler is recognized and being gamed, but if you have any other ideas I'd love to hear them.

What have you done to prove/disprove your suspicion? When asking a question like this, we need a minimal HTML sample that works with your code, and your code should be runnable. As is you're asking us to have a LinkedIn account and set up code that can run your code. See "[ask]", including the links at the bottom of the page. I'd recommend using Nokogiri to parse the HTML as it'll help remove the chance of false-positives since regex are not good at handling markup. — the Tin Man, Feb 04 '16 at 21:43
Code added. Linkedin search results are spit out in javascript so the json is needed vs nokogiri. prove/disprove: I've run the script & the names that come back are my current connections, a manual search w the same URL cut/past comes back as recruiters from boston. — user1222303, Feb 04 '16 at 22:47
Much better. You might want to look at the real user-agent string for Mac OS Firefox. http://www.useragentstring.com/pages/Firefox/ *If* they're sniffing your user-agent, using the full string could help. Rather than scraping, have you tried using their API? Scraping is sure to be a violation of their TOS, whereas using their API will avoid these sort of problems. — the Tin Man, Feb 04 '16 at 23:12
Were you able to resolve this? I am having the exact same issue. — Martin Sommer, Jun 10 '16 at 04:01

Why does Mechanize gets different results than a human search?

0 Answers0