I'm running a hadoop job on Amazon Elastic MapReduce and I keep getting an OutOfMemory error. The values are admittedly a little bit larger than most MapReduce values, but it seems even when I decrease the size dramatically it still happens. Here's the stack trace:
Error: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1698)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1558)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1407)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1339)
I'm not exactly sure what code to show since this stack trace is entirely outside the scope of my code. The version is hadoop 0.20.205
Is there some way to configure the reducer to read less values at a time? Shouldn't that be handled automatically based on the available memory?