Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site. The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful.
Questions tagged [anemone]
38 questions
0
votes
0 answers
break statement in loop not working
I am new to anemone gem. I have written the following code:
anemone.on_every_page do |page|
if page.url.to_s.match(/\-ad$/)
unless page.url.to_s.match("restaurant|hotel")
p "not useful url: #{page.url}"
count +=…

Joy
- 4,197
- 14
- 61
- 131
0
votes
0 answers
Ruby open_uri always 404. (allow https redirects git version)
I'm using the open-uri module which allows https redirects.
What I'm trying to do is open every page from a domain. I do this by first crawling it through anemone:
require 'anemone'
require "./open_uri"
class Query
def initialize()
fs =…

Bula
- 2,398
- 5
- 28
- 54
0
votes
1 answer
web crawler in rails,how to crawl all pages of the site
I need to get all urls from all pages of the given domain,
I think it make sense to use background jobs, placing them on multiple queues
trying to use cobweb but it seems very confusing gem,
and anomone, anemone is working for a long time if…

Aydar Omurbekov
- 2,047
- 4
- 27
- 53
0
votes
2 answers
Getting all URLs using anemone gem (very large site)
The site I want to index is fairly big, 1.x million pages. I really just want a json file of all the URLs so I can run some operations on them (sorting, grouping, etc).
The basic anemome loop worked well:
require…

mustacheMcGee
- 481
- 6
- 19
0
votes
1 answer
How to handle NILs with Anemone / Nokogiri web scraper?
def scrape!(url)
Anemone.crawl(url) do |anemone|
anemone.on_pages_like %[/events/detail/.*] do |page|
show = {
headliner: page.doc.at_css('h1.summary').text,
openers: page.doc.at_css('.details h2').text
…

GN.
- 8,672
- 10
- 61
- 126
0
votes
1 answer
anemone print links on first page
wanted to see what i was doing wrong. here.
I need to print the links on the parent page, even they are for another domain. And get out.
require 'anemone'
url = ARGV[0]
Anemone.crawl(url, :depth_limit => 1) do |anemone|
anemone.on_every_page do…

tven
- 547
- 6
- 18
-1
votes
1 answer
How to scrape products from site with ruby/anemone/nokogiri
Is it possible to scrape the products from a ecommerce site using the anemone and nokogiri libs in ruby?
I understand how to pull the data I need from each product page using nokogiri but I can't figure out how to make anemone/nokogiri crawl the…

Dan
- 641
- 9
- 25
-2
votes
1 answer
Anemone - NoMethodError: undefined method `xpath' for nil:NilClass
I'm just starting to learn more about writing a web crawler in Ruby which is designed to crawl my blog and find broken external links using the Anemone gem and the rake task below...
task :testing_this => :environment do
require 'anemone'
…

NorthernMarketer
- 1
- 2