1

This is a follow up question of Extracting rows containing specific value using mapReduce and hadoop
Mapper function

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    for(String word: words )
    {
        if(words[3].equals("40")){  
            saleValue.set(Integer.parseInt(words[0]));
            rangeValue.set(words[3]);
            con.write( rangeValue , saleValue );
        }
    }
}   
}

Reducer function

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
    private IntWritable result = new IntWritable();  
    public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
    {  
        for(IntWritable value : values)  
        {  
            result.set(value.get());  
            con.write(word, result);  
        }  
    }  
}

Output obtained is

40 105  
40 105  
40 105  
40 105

EDIT 1 : But the Expected output is

40 102  
40 104  
40 105

What am I doing wrong ?

What exactly is happening here in mapper and reducer function ?

Community
  • 1
  • 1
user6119874
  • 95
  • 2
  • 12
  • You are writing out Key Value pairs... What more do you want to know? – OneCricketeer May 07 '16 at 20:15
  • Thanks for the suggestion @cricket_007 I will definitely try that ... I actually wanted to know what EXACTLY does mapper return and reducer- accept and print. – user6119874 May 07 '16 at 20:45
  • 1
    when you `extends` them, the order is `` for both classes. And the output key-values of the mapper **must** match the input key-values of the reducer – OneCricketeer May 07 '16 at 20:48
  • @cricket_007 Okay... So in my case what does map return (not mapper) ... as in 40 105 40 105 40 105 ? or something else ? Please check the edit I made in question. – user6119874 May 07 '16 at 20:55
  • Its `void`, so nothing is returned, but what is read by the reducer is `(40, [105, 105, 105])` – OneCricketeer May 07 '16 at 20:58
  • 1
    To give some more info - Mappers are writing values to the reducer (and not "returning") using the context object and the reducers are emitting values to the output (again using the context - not by "return"). the mappers "sends" all the values with the same "key" to the same reducer (this actually happens in the shuffle stage) so each reducer will then "run" on a set of values with the same key. – It-Z May 07 '16 at 21:18
  • 1
    Thanks @It-Z that's exactly what I was looking for. – user6119874 May 25 '16 at 07:02

3 Answers3

1

What exactly is happening

You are consuming lines of comma-delimited text, splitting the commas, and filtering out some values. con.write() should only be called once per line if all you are doing is extracting only those values.

The mapper will group all the "40" keys that you output and form a list of all the values that were written with that key. And that is what the reducer is reading over.

You should probably try this for your map function.

// Set the values to write 
saleValue.set(Integer.parseInt(words[0]));
rangeValue.set(words[3]);

// Filter out only the 40s
if(words[3].equals("40")) {
    // Write out "(40, safeValue)" words.length times 
    for(String word: words )
    {
        con.write( rangeValue , saleValue );
    }
}

If you don't want duplicate values for the length of the split string, then get rid of the for loop.

All your reducer is doing is just printing out what it received from the mapper.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
1

In the context of the original question - you don't need the loop not in the mapper nor in the reducer as you are duplicating entries:

public static class MapForWordCount extends Mapper<Object, Text, Text, IntWritable>{

private IntWritable saleValue = new IntWritable();
private Text rangeValue = new Text();

public void map(Object key, Text value, Context con) throws IOException, InterruptedException
{
    String line = value.toString();
    String[] words = line.split(",");
    if(words[3].equals("40")){  
       saleValue.set(Integer.parseInt(words[0]));
       rangeValue.set(words[3]);
       con.write(rangeValue , saleValue );
    }
}   
}

And in the reducer, as suggested by @Serhiy in the original question you need only one line of code:

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>  
{  
private IntWritable result = new IntWritable();  
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException  
{  
    con.write(word, null);  
} 

Regrading "Edit 1" - I will leave it a trivial practice :)

Community
  • 1
  • 1
It-Z
  • 1,961
  • 1
  • 23
  • 33
0

Mapper output would be something like this :

<word,count>

Reducer output would be like this :

<unique word, its total count>

Eg: A line is read and all words in it are counted and put in a <key,value> pair:

<40,1>
<140,1>
<50,1>
<40,1> ..

here 40,50,140, .. are all keys and the value is the count of number of occurrences of that key in a line. This happens in the mapper.

Then, these key,valuepairs are sent to the reducer where similar keys are all reduced to a single key and all the values associates with that key is summed to give a value to the key-value pair. So, the result of the reducer would be something like:

<40,10>
<50,5>
...

In your case, the reducer isn't doing anything. The unique values/words found by the mapper are just given out as the output.

Ideally, you are supposed to reduce & get an output like : "40,150" was found 5 times on the same line.

Ani Menon
  • 27,209
  • 16
  • 105
  • 126