I am using Nutch 1.10 to crawl websites for my organization. I use a system with 16Gb RAM to do this crawl. As of now, my nutch file uses only 3-4Gb of RAM while crawling the data and it takes almmost 10 hours to finish it. Is there some way where i can configure the nutch to use more than 12Gb of RAM to finish the same task ? All Suggestions are most welcome !
Asked
Active
Viewed 127 times
1 Answers
1
Under the assumption that the script bin/nutch or bin/crawl is used for crawling in local mode (no Hadoop cluster): the environment variable NUTCH_HEAPSIZE
defines the heap size in MB.

Sebastian Nagel
- 2,049
- 10
- 10
-
Thanks for your answer. Can you please tell me where to set the NUTCH_HEAPSIZE. I have a total of 30 different config files with 30 different seed.txt files. Thanks in advance sir. – UMA MAHESWAR Jan 24 '19 at 09:04
-
FYI , total RAM in my machine is 16Gb and i am using this machine only to crawl and index the data into SOLR. – UMA MAHESWAR Jan 24 '19 at 09:06