1

I have an application that is basically an Hbase Mapreduce job with Apache Gora. I am very simple case that I want to copy one Hbase table data to a new table. Where to write new table name. I have reviewed this Guide but could not find where to put new table name. Following is the code snippet,

/* Mappers are initialized with GoraMapper.initMapper() or 
   * GoraInputFormat.setInput()*/
  GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class,
      LogAnalyticsMapper.class, true);

  /* Reducers are initialized with GoraReducer#initReducer().
   * If the output is not to be persisted via Gora, any reducer 
   * can be used instead. */
  GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);

Simple MR job is very easy for this case.

Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121

1 Answers1

1

I will redirect you to the tutorial, but I will try to clarify here :)

The table name is defined in you mappings. Check Table Mappings. Maybe you have a file called gora-hbase-mapping.xml where the mapping is defined. There should be something like this:

<table name="Nameofatable">
...
<class name="blah.blah.EntityA" keyClass="java.lang.Long" table="Nameofatable">

There you configure the table name (put the same name if you find both). There can be several <table> and <class>. Maybe one for your input and one for your output.

AFTER that, you have to instantiate your input/output datastores inStore and outStore. The tutorial got a bit messy and the creation of inStore and outStore got to the wrong section. You just do something like:

inStore = DataStoreFactory.getDataStore(String.class, EntityA.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(Long.class, OtherEntity.class, hadoopConf);

Explanation "in the other way":

  • You instantiate the datastore with DataStoreFactory.getDatastore(key class, entity class, conf).
  • The entity class requested is looked into gora-hbase-mapping.xml for <class name="blah.blah.EntityA".
  • In that <class> it is the attribute table=. That is your table name :)

So: you define an entity as input with its table name, and you define an entity as ouput with its table name


EDIT 1:

If the entity class is the same, but the table names are different, the only solution I can think of is creating two classes Entity1 and Entity2 with the same schema and in your gora-hbase-mapping.xml create two <table> and <class>. Then instantiante the stores like:

inStore = DataStoreFactory.getDataStore(String.class, Entity1.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(String.class, Entity2.class, hadoopConf);

It is not very clean but it should work :\


EDIT 2 (not for this question):

If the source table and the destination table are the same, there is a version for initReducerJob that allows this behavior.An example is in Nutch's GeneratorJob.java:

StorageUtils.initMapperJob(currentJob, fields, SelectorEntry.class, WebPage.class, GeneratorMapper.class, SelectorEntryPartitioner.class, true);
StorageUtils.initReducerJob(currentJob, GeneratorReducer.class);
Alfonso Nishikawa
  • 1,876
  • 1
  • 17
  • 33
  • Thanks brother. For my use case, it is actually Apache Nutch. I was thinking as you guided. But how I can handle two different tables (source + sink) with same schema in one job. For my case, I have to copy one Hbase table (created by Nutch) to a new table. – Hafiz Muhammad Shafiq Aug 28 '19 at 03:52
  • Both entities belong to same class in my case. – Hafiz Muhammad Shafiq Aug 28 '19 at 03:53
  • 1
    Updated the answer. Basically, treat the input and output like if they were different classes (will have to create another entity class though will be almost equal) – Alfonso Nishikawa Aug 28 '19 at 10:13