0

I want to create a multi threaded web crawler, however after doing some research I've discovered that the gem mechanize is not multi thread safe. So my question is, is it possible to write a multi thread crawler to scrape multiple search engines at one time? Example:

def site(url) 
  Nokogiri::HTML(RestClient.get(url))
end

def parse(url, tag, i)
  parsing = site(url)
  parsing.css(tag)[i]].to_s
end

Thread.new do
  agent = Mechanize.new 
  # do some searching and start the search
  parse('google.com', 'html', 0)
end

Thread.new do
  agent = Mechanize.new
  # same thing and run them in tandem 
  parse('duckduckgo.com', 'html', 0)
end
JasonBorne
  • 87
  • 2
  • 9
  • Why would this not be possible? I'm not entirely sure if `mechanize` can run in multiple threads at a time, but it may be possible? – 13aal Apr 20 '16 at 18:07
  • Also possible duplicate: http://stackoverflow.com/questions/903143/is-using-threads-and-ruby-mechanize-safe – 13aal Apr 20 '16 at 18:08
  • FWIW: there are existing multi-threaded crawlers (e.g. [anemone](https://github.com/chriskite/anemone)) that you could use. – orde May 06 '16 at 17:16

0 Answers0