Mapreduce output showing all records in same line

Question

I have implemented a mapreduce operation for log file using amazon and hadoop with custom jar.

My output shows the correct keys and values, but all the records are being displayed in a single line. For example, given the following pairs:

<1387, 2>
<1388, 1>

This is what's printing:

1387     21388     1

This is what I'm expecting:

1387     2
1388     1

How can I fix this?

Can you please show the code where you print out the records? As it stands, we have no clue what you're doing! (My guess is that you're missing adding a newline character) — Chris Forrence, Sep 11 '14 at 16:30
public static void main(String[] args) throws Exception { JobConf conf = new JobConf(LogAnalyzer.class); conf.setJobName("Loganalyzer"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(LogAnalyzer.Map.class); conf.setCombinerClass(LogAnalyzer.Reduce.class); conf.setReducerClass(LogAnalyzer.Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.set("mapreduce.textoutputformat.separator", "--"); — Deepa Sadagopan, Sep 13 '14 at 13:28
FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } — Deepa Sadagopan, Sep 13 '14 at 13:31
Reduce function : public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } — Deepa Sadagopan, Sep 13 '14 at 13:32
Map function: public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = ((Text) value).toString(); Matcher matcher = p.matcher(line); if (matcher.matches()) { String timestamp = matcher.group(4); minute.set(getMinuteBucket(timestamp)); output.collect(minute, ONE); //context.write(minute, one); } } — Deepa Sadagopan, Sep 13 '14 at 13:32

score 0 · Answer 1 · answered Sep 16 '14 at 10:28

Cleaned up your code for you :)

public static void main(String[] args) throws Exception {   
    JobConf conf = new JobConf(LogAnalyzer.class);
    conf.setJobName("Loganalyzer");
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);
    conf.setMapperClass(LogAnalyzer.Map.class);
    conf.setCombinerClass(LogAnalyzer.Reduce.class);
    conf.setReducerClass(LogAnalyzer.Reduce.class);
    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);
    conf.set("mapreduce.textoutputformat.separator", "--");
    FileInputFormat.setInputPaths(conf, new Path(args[0])); 
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
    JobClient.runJob(conf); 
}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
        int sum = 0; 
        while (values.hasNext()) {
            sum += values.next().get(); 
        }
        output.collect(key, new IntWritable(sum)); 
    } 
}


public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    String line = ((Text) value).toString(); 
    Matcher matcher = p.matcher(line); 
    if (matcher.matches()) { 
        String timestamp = matcher.group(4); 
        minute.set(getMinuteBucket(timestamp)); 
        output.collect(minute, ONE); //context.write(minute, one); 
    }
}

This isn't hadoop-streaming, it's just a normal java job. You should amend the tag on the question.

This looks okay to me, although you don't have the mapper inside a class, which I assume is a copy/paste omission.

With regards to the line endings. I don't suppose you are looking at the output on Windows? It could be a problem with unix/windows line endings. If you open up the file in sublime or another advanced text editor you can switch between unix and windows. See if that works.

Mapreduce output showing all records in same line

1 Answers1