I need to write a map reduce that takes input as two input files. First input file looks like this:
key1 , 25
key1 , 35
key1 , 60
key2 , 30
key3 , 45
key3 , 65
Second input file is as follows:
key1, -10
key2, -20
key3, -15
and I need to get an output as:
key1 , 15
key1 , 25
key1 , 50
key2 , 10
key3 , 30
key3 , 50
(The output is first input file's values subtracted by the second input file)
How could this be done? How will the mapper and reducer task look like?
My approach is as follows:
I think I will have to have two mappers, one per input file (Can a single mapper be used to read both the files?). Mappers will simply emit the key and the value.
At the reducer end, when I receive all values corresponding to a key, I have to subtract the values, that is coming from the first file, by the value in the second file.
So I need to find out whether the corresponding value is coming from the second input file or first file. how can this be done?
Any other better approaches?