0

I have an usecase where from a file file i have to read odd lines using java map reduce:

But as per the Inputformat class that only reads '\n' as the line termination. Bu t i want read as follows:

INPUT:
sampat
kumar
hadoop
mapredue

OUTPUT:
sampat
hadoop

  • Have you tried changing newline char to space? - http://stackoverflow.com/questions/12118836/how-to-read-text-source-in-hadoop-separated-by-special-character - and - https://amalgjose.com/2013/05/27/custom-text-input-format-record-delimiter-for-hadoop/` – Ronak Patel Jun 22 '16 at 15:15

1 Answers1

0

you can achieve the desired output based on your input with this way also: (not need to write custom Input/output format)

input:

sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4

output:

sampat1 hadoop3 sampat1 hadoop3 sampat1 hadoop3 sampat1 hadoop3 sampat1 hadoop3 

code:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class OddLine {

    public static class OddLineMapper extends Mapper<Object, Text, Text, Text> {

        private StringBuilder sb = new StringBuilder("");

        @Override
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            String[] lines = value.toString().split(" ");

            for(int i=0; i < lines.length; i+=2)
                sb.append(lines[i] + " ");

            context.write(new Text(""), new Text(sb.toString()));
        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "Get odd words");
        job.setJarByClass(OddLine.class);
        job.setMapperClass(OddLineMapper.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        FileSystem fs = null;
        Path dstFilePath = new Path(args[1]);
        try {
            fs = dstFilePath.getFileSystem(conf);
            if (fs.exists(dstFilePath))
                fs.delete(dstFilePath, true);
        } catch (IOException e1) {
            e1.printStackTrace();
        }
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
Ronak Patel
  • 3,819
  • 1
  • 16
  • 29