1

I have setup Nutch 1.17 to crawl few thousand domains with inlinks crawl only. One of my main requirement is I should have to visit home pages again and again (lets say after 2 hour) and if there is any new page, then only that should be crawled.

What should be the best possible way ? I am thinking to crawl run injector job again and again to crawl home pages. Is it the right way ? Meanwhile how should I ensure that inlinks are also going to fetch with time.

Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121

0 Answers0