2

Hi I am new to MapReduce and HBase. Please guide. I am moving tabular data to HBase using MapReduce. Now data is reached in HBase (so in HDFS). I have created mapreduce job which will read tabular data from file and put it into Hbase using HBase APIs.

Now my doubt is can I query HBase data using MapReduce? I dont want to execute HBase commands to query data. Is is possible to query data of HBase using MapReduce?

Please help or advice.

Umesh K
  • 13,436
  • 25
  • 87
  • 129

1 Answers1

3

Of course you can, HBase comes with a TableMapReduceUtil to help you configuring MapReduce jobs for scanning data. It will automatically create a map task for each region.

Please check this example extracted from the HBase book:

Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class);     // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs
...

TableMapReduceUtil.initTableMapperJob(
  tableName,        // input HBase table name
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper
  null,             // mapper output key
  null,             // mapper output value
  job);
job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't emitting anything from mapper

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}

MORE EXAMPLES HERE

Rubén Moraleda
  • 3,017
  • 1
  • 18
  • 20