0

I need to get all urls from all pages of the given domain,
I think it make sense to use background jobs, placing them on multiple queues
trying to use cobweb but it seems very confusing gem,
and anomone, anemone is working for a long time if there are a lot of pages

require 'anemone'

Anemone.crawl("http://www.example.com/") do |anemone|
  anemone.on_every_page do |page|
      puts page.links
  end
end

What do u think would fit me best?

Aydar Omurbekov
  • 2,047
  • 4
  • 27
  • 53

1 Answers1

2

You can use Nutch Crawler, Apache Nutch is a highly extensible and scalable open source web crawler software project.

ajknzhol
  • 6,322
  • 13
  • 45
  • 72