2

I am trying my hands on TotalOrderPartitioner hadoop. While doing so I am getting the following error. Error stating - "wrong key class"

Driver Code -

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.InputSampler;
import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;


public class WordCountJobTotalSort {

    public static void main (String args[]) throws Exception
    {
        if (args.length < 2 ) 
        {
            System.out.println("Plz provide I/p and O/p directory ");
            System.exit(-1);
        }

        Job job = new Job();

        job.setJarByClass(WordCountJobTotalSort.class);
        job.setJobName("WordCountJobTotalSort");            
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setInputFormatClass(SequenceFileInputFormat.class);
        job.setMapperClass(WordMapper.class);
        job.setPartitionerClass(TotalOrderPartitioner.class);
        job.setReducerClass(WordReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setNumReduceTasks(2);

        TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), new Path("/tmp/partition.lst"));

        InputSampler.writePartitionFile(job, new InputSampler.RandomSampler<IntWritable, Text>(1,2,2));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Mapper code -

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;


public class WordMapper extends Mapper <LongWritable,Text,Text, IntWritable >  
{

    public void map(IntWritable mkey, Text value,Context context)
            throws IOException, InterruptedException {

        String s = value.toString();

        for (String word : s.split(" "))
        {
            if (word.length() > 0 ){
                context.write(new Text(word), new IntWritable(1));

            }
        }
    }
}

Reducer COde -

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class WordReducer extends  Reducer <Text, IntWritable, Text, IntWritable> {

    public void reduce(Text rkey, Iterable<IntWritable> values ,Context context )
            throws IOException, InterruptedException {

        int count=0;

        for (IntWritable value : values){

            count = count + value.get();
        }

        context.write(rkey, new IntWritable(count));    
    }
}

Error -

[cloudera@localhost workspace]$ hadoop jar WordCountJobTotalSort.jar WordCountJobTotalSort file_seq/part-m-00000 file_out
15/05/18 00:45:13 INFO input.FileInputFormat: Total input paths to process : 1
15/05/18 00:45:13 INFO partition.InputSampler: Using 2 samples
15/05/18 00:45:13 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
15/05/18 00:45:13 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Exception in thread "main" java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.Text
    at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:1340)
    at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:336)
    at WordCountJobTotalSort.main(WordCountJobTotalSort.java:47)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Input File -

[cloudera@localhost workspace]$ hadoop fs -text file_seq/part-m-00000

0 Hello Hello

12 How How

20 is is

26 your your

36 job job

  • This line kind of says it all: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.Text. Check the linenumber it occurs on, and change the type. – Stultuske May 18 '15 at 09:12
  • From the error log I am not able to root down the exact line number which is causing this error. It seems to me that I am not correctly using the hadoop java packages but I am not sure... have very limited knowledge on hadoop. – Nitin Sharma May 18 '15 at 17:52
  • then look at your own code and the line mentioned there. at WordCountJobTotalSort.main(WordCountJobTotalSort.java:47) – Stultuske May 19 '15 at 06:48

3 Answers3

2

InputSampler performs the sampling during the Map stage (prior to both shuffle and reduce) and the sampling is done over the Mapper's Input KEY. We need to make sure the Input and Output KEY's of the mapper are the same; otherwise the MR framework won't find a suitable bucket to put the output Key, Value pairs with in the sampled space.

In this case the input KEY is LongWritable, and hence the InputSampler would create a partition based on a sub set of all LongWritable KEY's. But the output KEY is Text, hence the MR framework will fail to find a suitable bucket from with in the partition.

We can counter the problem by introducing a Preparation Stage.

bitan
  • 444
  • 4
  • 14
0

I was getting the same wrong key class error in my case it was because i was using combiner with custom writable. when i comment on combiner it works well.

Bhavesh Gadoya
  • 196
  • 2
  • 13
-1

Comment this two line and execute hadoop job

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

ok if it not work then after comment this two line you have to set both input and output format class

job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
  • Hi Krunal... After commenting those two line still I am getting the same error :( – Nitin Sharma May 19 '15 at 16:22
  • Hi Krunal ... I am already using job.setInputFormatClass(SequenceFileInputFormat.class); in my code. Now I have also added job.setOutputFormatClass(SequenceFileOutputFormat.class); But still same error is coming in the code – Nitin Sharma May 21 '15 at 06:15