0

I am using Nutch 1.10 to crawl websites for my organization. I use a system with 16Gb RAM to do this crawl. As of now, my nutch file uses only 3-4Gb of RAM while crawling the data and it takes almmost 10 hours to finish it. Is there some way where i can configure the nutch to use more than 12Gb of RAM to finish the same task ? All Suggestions are most welcome !

Dusan Bajic
  • 10,249
  • 3
  • 33
  • 43
UMA MAHESWAR
  • 167
  • 3
  • 16

1 Answers1

1

Under the assumption that the script bin/nutch or bin/crawl is used for crawling in local mode (no Hadoop cluster): the environment variable NUTCH_HEAPSIZE defines the heap size in MB.

Sebastian Nagel
  • 2,049
  • 10
  • 10
  • Thanks for your answer. Can you please tell me where to set the NUTCH_HEAPSIZE. I have a total of 30 different config files with 30 different seed.txt files. Thanks in advance sir. – UMA MAHESWAR Jan 24 '19 at 09:04
  • FYI , total RAM in my machine is 16Gb and i am using this machine only to crawl and index the data into SOLR. – UMA MAHESWAR Jan 24 '19 at 09:06