1

How can I tranfer a HBase table into Hive correctly?

What I tried before can you read in this question How insert overwrite table in hive with diffrent where clauses? ( I made one table to import all data. The problem here is that data is still in rows and not in columns. So I made 3 tables for news, social and all with a specific where clause. After that I made 2 Joins on the tables which is giving me the result table. So I had 6 Tables at all which is not really performant!)

to sum my problem up : In HBase are column familys which are saved as rows like this.

count   verpassen   news    1
count   verpassen   social  0
count   verpassen   all 1

What I want to achieve in Hive is a datastructure like this:

name      news    social   all
verpassen 1       0        1

How am I supposed to do this?

Community
  • 1
  • 1
dino
  • 239
  • 3
  • 12
  • 1
    There is a whole page on [Hive-Hbase Integration on the Hive wiki](https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration) – OneCricketeer Oct 30 '15 at 16:12
  • You can use hbasestorage handler in hive – madhu Oct 30 '15 at 16:48
  • hbase storage Handler doesnt work for me i got this exception : FAILED: SemanticException Cannot find class 'org.apache.ha doop.hive.hbase.HBaseStorageHandler' – dino Nov 04 '15 at 13:57

1 Answers1

1

Below is the approach use can use.

use hbase storage handler to create the table in hive

example script

CREATE TABLE hbase_table_1(key string, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f1:val") TBLPROPERTIES ("hbase.table.name" = "test");

I loaded the sample data you have given into hive external table.

enter image description here

select name,collect_set(concat_ws(',',type,val)) input from TESTTABLE group by name ;

i am grouping the data by name.The resultant output for the above query will be enter image description here

Now i wrote a custom mapper which takes the input as input parameter and emits the values.

from (select '["all,1","social,0","news,1"]' input from TESTTABLE group by name) d MAP d.input Using 'python test.py' as all,social,news

enter image description here

alternatively you can use the output to insert into another table which has column names name,all,social,news

Hope this helps

yoga
  • 1,929
  • 2
  • 15
  • 18
  • When I try to create a table i got this exception: Exception FAILED: SemanticException Cannot find class 'org.apache.ha doop.hive.hbase.HBaseStorageHandler' – dino Nov 04 '15 at 13:56
  • You have to add the hbase jars to hive lib folder. hbase-*.jar , zookeeper*.jar, hive-hbase-handler*.jar – yoga Nov 04 '15 at 15:50
  • zookeeper.jar and hive-hbase-handler-1.2.1 is already in there. But i have alot of hbase*.jar. for example hbase-it-0.98.0-hadoop2.jar or hbase-client-0.98.0-hadoop2.jar. Which one is the right one? – dino Nov 04 '15 at 17:30
  • I just copied all hbase *.jar from hbase to hive and rerun the example. Now i got an FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hbase/HBaseConfiguration – dino Nov 04 '15 at 17:43
  • http://stackoverflow.com/questions/33544169/what-is-the-right-way-to-set-up-hbase-hadoop-hive-to-access-hbase-through-hive – dino Nov 05 '15 at 12:10