We're doing a simple pig join between a small table and a big skewed table.
We cannot use "using skewed"
due to another bug (pig skewed join with a big table causes "Split metadata size exceeded 10000000") :(
If we use the default mapred.job.shuffle.input.buffer.percent=0.70
some of our reducers fail in the shuffle stage:
org.apache.hadoop.mapred.Task: attempt_201305151351_21567_r_000236_0 :
Map output copy failure : java.lang.OutOfMemoryError: GC overhead limit exceeded
If we change it to mapred.job.shuffle.input.buffer.percent=0.30
it finishes nicely, although in 2 hours (there are 3 lagging reducers out of the 1000 reducers we use), and we can see in the lagging reducers log something like this:
SpillableMemoryManager: first memory handler call-
Usage threshold init = 715849728(699072K) used = 504241680(492423K) committed = 715849728(699072K) max = 715849728(699072K)
Why does this happen? How comes the SplilableMemoryManager doesn't protect us from failing when the shuffle input buffer is on 70%?