4

I am experimenting with Hadoop Map-Reduce and in my tests I am able to store output of reducers to HBase. However, I want to write data a mysql database instead of HBase. Mappers would still be reading their input data from HBase. I have found this but it requires to use MySQL at both input and output while I need it at only output. Also, above link uses some deprecated classes from org.apache.hadoop.mapred package for which a new package org.apache.hadoop.mapreduce is available now, however I am not able to find any tutorial using this new package till now.

vikas
  • 1,535
  • 1
  • 13
  • 22

1 Answers1

1

I have found this but it requires to use MySQL at both input and output while I need it at only output.

The InputFormat (DBInputFormat) is independent of the OutputFormat (DBOutputFormat). It should be possible be possible to read from HBase in the Mapper and write to a DB in the Reducer.

With the new MR API set the Job#setInputFormat and Job#setOutputFormat, with the old MR API set the JobConf#setInputFormat and JobConf#setOutputFormat appropriately to what input/output format is required. Both these formats need not be same. It should be possible to read from an XML in a mapper and write to a Queue in the Reducer also if required.

Also, above link uses some deprecated classes from org.apache.hadoop.mapred package for which a new package org.apache.hadoop.mapreduce is available now, however I am not able to find any tutorial using this new package till now.

If you are comfortable with the old API, then go ahead and use it. There is not much difference in the functionality between the new and the old API. There are two DBInputFormat for the old and the new API. Make sure you don't mix the old/new InputFormats with the old/new MR API.

Here is a tutorial on the new API.

Community
  • 1
  • 1
Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • I also thought the same and used new API for this. However, I am getting NPE at some point in DBConfiguration class. So, I am seeking for appropriate uses of DBInputFormat, DBConfiguration and DBOutputFormat. – vikas Dec 06 '11 at 16:58
  • 1
    Thanks for your last link. I found my problem by going through comments on that [tutorial](http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html). My problem resolved with reply for first comment of author in that tutorial. – vikas Dec 07 '11 at 06:01