If you know which keys are going to have an unusually large amount of values, you could use the following trick.
You could implement a custom Partitioner
which would ensure that each of your skewed keys goes to a single partition, and then everything else would get distributed to the remaining partitions by their hashCode
(which is what the default HashPartitioner
does).
You can create a custom Partitioner
by implementing this interface:
public interface Partitioner<K, V> extends JobConfigurable {
int getPartition(K key, V value, int numPartitions);
}
And then you can tell Hadoop to use your Partitioner
with:
conf.setPartitionerClass(CustomPartitioner.class);