Questions tagged [anemone]

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site. The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful.

http://anemone.rubyforge.org/

38 questions

votes

0 answers

break statement in loop not working

I am new to anemone gem. I have written the following code: anemone.on_every_page do |page| if page.url.to_s.match(/\-ad$/) unless page.url.to_s.match("restaurant|hotel") p "not useful url: #{page.url}" count +=…

ruby anemone

asked Apr 25 '14 at 11:33

Joy

4,197
14
61
131

votes

0 answers

Ruby open_uri always 404. (allow https redirects git version)

I'm using the open-uri module which allows https redirects. What I'm trying to do is open every page from a domain. I do this by first crawling it through anemone: require 'anemone' require "./open_uri" class Query def initialize() fs =…

ruby http open-uri anemone

asked Mar 26 '14 at 12:02

Bula

2,398
5
28
54

votes

1 answer

web crawler in rails,how to crawl all pages of the site

I need to get all urls from all pages of the given domain, I think it make sense to use background jobs, placing them on multiple queues trying to use cobweb but it seems very confusing gem, and anomone, anemone is working for a long time if…

ruby-on-rails web-crawler resque anemone

asked Oct 11 '13 at 05:32

Aydar Omurbekov

2,047
4
27
53

votes

2 answers

Getting all URLs using anemone gem (very large site)

The site I want to index is fairly big, 1.x million pages. I really just want a json file of all the URLs so I can run some operations on them (sorting, grouping, etc). The basic anemome loop worked well: require…

ruby anemone

asked Aug 21 '13 at 20:35

mustacheMcGee

votes

1 answer

How to handle NILs with Anemone / Nokogiri web scraper?

def scrape!(url) Anemone.crawl(url) do |anemone| anemone.on_pages_like %[/events/detail/.*] do |page| show = { headliner: page.doc.at_css('h1.summary').text, openers: page.doc.at_css('.details h2').text …

ruby nokogiri scraper anemone

asked Aug 13 '13 at 20:47

GN.

8,672
10
61
126

votes

1 answer

anemone print links on first page

wanted to see what i was doing wrong. here. I need to print the links on the parent page, even they are for another domain. And get out. require 'anemone' url = ARGV[0] Anemone.crawl(url, :depth_limit => 1) do |anemone| anemone.on_every_page do…

ruby anemone

asked Mar 27 '13 at 05:52

tven

-1

votes

1 answer

How to scrape products from site with ruby/anemone/nokogiri

Is it possible to scrape the products from a ecommerce site using the anemone and nokogiri libs in ruby? I understand how to pull the data I need from each product page using nokogiri but I can't figure out how to make anemone/nokogiri crawl the…

ruby nokogiri scraper anemone

asked May 20 '12 at 07:02

Dan

-2

votes

1 answer

Anemone - NoMethodError: undefined method `xpath' for nil:NilClass

I'm just starting to learn more about writing a web crawler in Ruby which is designed to crawl my blog and find broken external links using the Anemone gem and the rake task below... task :testing_this => :environment do require 'anemone' …

ruby-on-rails ruby xpath anemone

asked Sep 27 '16 at 10:00

NorthernMarketer

Prev 1 2