13

What is the best Open Source Web Crawler Tool, written in Java.

Kara
  • 6,115
  • 16
  • 50
  • 57
cuneytykaya
  • 579
  • 1
  • 5
  • 14

2 Answers2

10

Try crawler4j. You just need to implement a simple interface which controls which URLs to visit and what to do with each crawled page.

Scott Wardlaw
  • 652
  • 1
  • 8
  • 13
Andy
  • 8,870
  • 1
  • 31
  • 39
  • I have problems crawling HTTPS websites using this crawler ("site failed to respond" while it opens fine in browser etc.) – ed22 Sep 29 '17 at 08:29
5

in java I think it boils down to Nutch vs Heritrix. You should specify what your needs are to get a better answer.

riffraff
  • 2,429
  • 1
  • 23
  • 32