I'm running Nutch on a Elastic MapReduce, with 3 worker nodes. I'm using Nutch 1.4, with the default configuration it ships with (after adding a user agent).
However, even though I'm crawling a list of 30,000 domains the fetching step is only run from one worker node, even though the parsing step runs on all three.
How do I get it to run the fetch step from all three nodes?
*EDIT* The problem was that I needed to set the mapred.map.tasks property to the size of my Hadoop cluster. You can find this documented here