Creating Sequence File Format for Hadoop MR

Question

I was working with Hadoop MapRedue, and had a question. Currently, my mapper's input KV type is LongWritable, LongWritable type and output KV type is also LongWritable, LongWritable type. InputFileFormat is SequenceFileInputFormat. Basically What I want to do is to change a txt file into SequenceFileFormat so that I can use this into my mapper.

What I would like to do is

input file is something like this

1\t2 (key = 1, value = 2)

2\t3 (key = 2, value = 3)

and on and on...

I looked at this thread How to convert .txt file to Hadoop's sequence file format but reliazing that TextInputFormat only support Key = LongWritable and Value = Text

Is there any way to get txt and make a sequence file in KV = LongWritable, LongWritable?

score 7 · Accepted Answer · answered Sep 03 '12 at 17:45

Sure, basically the same way I told in the other thread you've linked. But you have to implement your own Mapper.

Just a quick scratch for you:

public class LongLongMapper extends
    Mapper<LongWritable, Text, LongWritable, LongWritable> {

  @Override
  protected void map(LongWritable key, Text value,
      Mapper<LongWritable, Text, LongWritable, LongWritable>.Context context)
      throws IOException, InterruptedException {

    // assuming that your line contains key and value separated by \t
    String[] split = value.toString().split("\t");

    context.write(new LongWritable(Long.valueOf(split[0])), new LongWritable(
        Long.valueOf(split[1])));

  }

  public static void main(String[] args) throws IOException,
      InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(LongLongMapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    // increase if you need sorting or a special number of files
    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(LongWritable.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    FileInputFormat.addInputPath(job, new Path("/input"));
    FileOutputFormat.setOutputPath(job, new Path("/output"));

    // submit and wait for completion
    job.waitForCompletion(true);
  }
}

Each value in your mapper function will get a line of your input, so we are just splitting it by your delimiter (tab) and parsing each part of it into longs.

That's it.

thank you, got lots of ideas from the skeleton, and was able to create a seq. file writer. — user1566629, Sep 04 '12 at 18:44
if you have an another example please send me so i can understand it more better, email id ashishwinoria@gmail.com, Thanks in advance dear — Ashish Ratan, Feb 07 '14 at 04:30
And please tell me what will be the reducer class input output format, i mean Key and value for input and output — Ashish Ratan, Feb 07 '14 at 04:38

Creating Sequence File Format for Hadoop MR

1 Answers1