2

We can crawl a hole website with anemone (ex: https://stackoverflow.com/), but what if I want only focus on a certain folder (ex: https://stackoverflow.com/questions). How can I do this ? maybe with the "focus_crawl" method ?

Community
  • 1
  • 1
Ghilas BELHADJ
  • 13,412
  • 10
  • 59
  • 99

1 Answers1

2

check the keep_if method may be this helps

http://danneu.com/posts/8-scraping-a-blog-with-anemone-ruby-web-crawler-and-mongodb#toc_1

try and pass the pattern as you want to crawl

also there is a gist https://gist.github.com/1149906.

NOTE: I haven't tested it but you can sure surely try.

Pritesh Jain
  • 9,106
  • 4
  • 37
  • 51
  • 2
    thank you PriteshJ but I finally found the answer. I've used the method `on_pages_like` instead of `on_every_page` with the pattern like this: `on_pages_like(/http:\/\/stackoverflow.com\/questions\/.)` and it works well. thank you again – Ghilas BELHADJ Aug 08 '12 at 18:03