2

I was trying to process four lines of a data-set together. I have used a variable lineCount in the mapper for this. But am not getting some part of the outputs correctly.

Here is my mapper class:-

public class GC_Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    int lineCount = 0;

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        if (lineCount % 4 == 0) {
            context.write(new Text("#Reads"), new IntWritable(1));
            lineCount++;
            return;
        }

        if (lineCount % 4 == 1) {
            context.write(new Text("X"), new IntWritable(1));               
            lineCount++;
            return;
        }

        if (lineCount % 4 == 2) {
            context.write(new Text("Y"), new IntWritable(1));
            lineCount++;
            return;
        }

        if (lineCount % 4 == 3) {
            context.write(new Text("Z"), new IntWritable(1));
            lineCount++;
            return;
        }
    }
}

My Reducer :-

public class GC_Reducer extends
        Reducer<Text, IntWritable, Text, DoubleWritable> {
    int numReads;

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        if ((key.toString()).startsWith("#")) {
            for (IntWritable read : values) {
                numReads += read.get();
            }
            context.write(key, new DoubleWritable(numReads));
        }

        if ((key.toString().startsWith("X"))) {
            double sum1 = 0;
            for (IntWritable val : values) {
                sum1 += val.get();
            }
            context.write(key, new DoubleWritable(sum1));
        }

        if ((key.toString().startsWith("Y"))) {
            double sum2 = 0;
            for (IntWritable val : values) {
                sum2 += val.get();
            }
            context.write(key, new DoubleWritable(sum2));
        }

        if ((key.toString().startsWith("Z"))) {
            double sum3 = 0;
            for (IntWritable val : values) {
                sum3 += val.get();
            }
            context.write(key, new DoubleWritable(sum3));
        }
    }
}

My intention was to take the number of Reads(provided 4 lines are taken as a single record) and to process four lines differently. But the problem am facing that I got the output as :-

#Reads 50.0
X      100.0
Y      100.0
Z      100.0  

But my desired output was 50.0 for all the keys. Only #Reads value is correct. Please help me to find a solution. Thanks in advance !

Sachin
  • 1,675
  • 2
  • 19
  • 42
  • What is your input? And why are you using a double variable when the input is IntWritable – Jijo Mar 04 '15 at 06:10
  • My input is a sequence file where one sequence is consisted of four lines. And i want to process four lines differently. This is a sample program. Actual processing is not included. I just outputted the count of each line. – Sachin Mar 04 '15 at 06:14
  • Leave all about the output value type. Again if i provided Intwritable,output will be the same – Sachin Mar 04 '15 at 06:16
  • How about defining all the variables in reduce part as global similar to numReads – Jijo Mar 04 '15 at 06:19
  • I don't think that will make any difference. – Sachin Mar 04 '15 at 06:22
  • Then how do you justify that only the first field has got correct output – Jijo Mar 04 '15 at 06:24
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/72200/discussion-between-sachin-and-jijo). – Sachin Mar 04 '15 at 06:24

2 Answers2

1

If all your data is in 4-line-record format, then it sounds better to work with FileInputFormat as well as the RecordReader. You just need to send 4 lines of text file together to a mapper, not send it line by line.

Take a look at this answer to my question about reading pdfs in hadoop. Your main work will rely on nextKeyValue function of your RecordReader extended class.

Community
  • 1
  • 1
Mehraban
  • 3,164
  • 4
  • 37
  • 60
0

I got the answer by my selves. It was a mistake from my part actually. My mapper output value was IntWritable. And i tried to assign it to a double variable and tried to write that value as DoubleWritable in the reducer. Thanks all !

Sachin
  • 1,675
  • 2
  • 19
  • 42