I am trying to attach a custom (java) partitioner to my MapReduce streaming job. I am using this command:
../bin/hadoop jar ../contrib/streaming/hadoop-streaming-1.2.1.jar \
-libjars ./NumericPartitioner.jar -D mapred.map.tasks=12 -D mapred.reduce.tasks=36 \
-input /input -output /output/keys -mapper "map_threeJoin.py" -reducer "keycount.py" \
-partitioner newjoin.NumericPartitioner -file "map_threeJoin.py" \
-cmdenv b_size=6 -cmdenv c_size=6
The important bit of that is the file NumericPartitioner.jar, which resides in the same folder the command is being run in (a level down from the Hadoop root installation.) Here is its code:
package newjoin;
import java.util.*;
import java.lang.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.io.*;
public class NumericPartitioner extends Partitioner<Text,Text>
{
@Override
public int getPartition(Text key,Text value,int numReduceTasks)
{
return Integer.parseInt(key.toString().split("\\s")[0]) % numReduceTasks;
}
}
And yet, when I try to run the above command, I get:
-partitioner : class not found : newjoin.NumericPartitioner
Streaming Command Failed!
What's going on here, and how can I get mapReduce to find my partitioner?