I am implementing Hadoop mapreduce. My input to map is a table as shown below:
customerid, IP, Attr , Date
customer1, IP1, attr1, date1
customer2, IP2, attr1, date2
the output from the mapper should be multiple files
File 1 : IP-m-00000
key, value
customer1_IP1 , date1
customer2_IP2 , date2
File 2: Attr-m-00000
key, value
customer1_attr1 , date1
customer2_attr1 , date2
I have hadoop 2.2.0 installed and i am using the following code
MultipleOutputs.addMultiNamedOutput (job, "IP", TextOutputFormat.class, Text.class, Text.class); // in the Driver.class
MultipleOutputs.getCollector("IP", context).collect(txtKey, txtValue); // in the Mapper.class
where my txtKey is customerid_$Attribute, txtValue is the date.
I have 2.8.0 installed on another personal machine and MultipleOutputs object has write functionality which was very easy to implement. MultipleOutputs.write() which is in hadoop-2.8.0 is not implemented in hadoop-2.2.0.
Any ideas on how to write multipleOutput files in hadoop-2.2.0 where we do not have MultipleOutputs.write() functionality?
If this question requires any modification, can you please comment and not close the question!
Thanks, Guru