Configuring RAM in Nutch

Question

I am using Nutch 1.10 to crawl websites for my organization. I use a system with 16Gb RAM to do this crawl. As of now, my nutch file uses only 3-4Gb of RAM while crawling the data and it takes almmost 10 hours to finish it. Is there some way where i can configure the nutch to use more than 12Gb of RAM to finish the same task ? All Suggestions are most welcome !

score 1 · Answer 1 · answered Jan 23 '19 at 11:59

1

Under the assumption that the script bin/nutch or bin/crawl is used for crawling in local mode (no Hadoop cluster): the environment variable NUTCH_HEAPSIZE defines the heap size in MB.

answered Jan 23 '19 at 11:59

Sebastian Nagel

2,049
10
10

Thanks for your answer. Can you please tell me where to set the NUTCH_HEAPSIZE. I have a total of 30 different config files with 30 different seed.txt files. Thanks in advance sir. – UMA MAHESWAR Jan 24 '19 at 09:04
FYI , total RAM in my machine is 16Gb and i am using this machine only to crawl and index the data into SOLR. – UMA MAHESWAR Jan 24 '19 at 09:06

Configuring RAM in Nutch

1 Answers1