I'm trying to crawl pages from one particular domain using Heritrix.
The crawl rate seems to be really slow. And one thing I notice is that while there are 25 threads, 24 of them are always idle. It seems there is only one thread that is actively taking URI from queue and fetching data from server.
Rates
0.33 URIs/sec (0.34 avg); 18 KB/sec (20 avg)
Load
1 active of 25 threads; 1 congestion ratio; 13193 deepest queue; 13193 average depth
Elapsed
1h32m3s424ms
Threads
25 threads: 24 ABOUT_TO_GET_URI, 1 ABOUT_TO_BEGIN_PROCESSOR; 24 noActiveProcessor, 1 fetchHttp
Frontier
RUN - 2 URI queues: 1 active (1 in-process; 0 ready; 0 snoozed); 0 inactive; 0 ineligible; 0 retired; 1 exhausted
Memory
79933 KiB used; 143508 KiB current heap; 253440 KiB max heap
Any configuration I can use to make use of all the 25 threads? I've already discovered and changed configs related to politeness (min/max delay) Thanks!