Using HBase for small dataset and big data analysis at the same time?

Question

I am building an application which requires lot of data processing and analytics (processing tons of files at same time ).

I am planing to use Hadoop (Map-reduce , Hbase(HDFS file system)) for this.

At same time i have small dataset like user setting, application user listing ,payment information and other which can be easily managed on any RDMS database like sql or Mongo.

Some time it may have few aggregated and analysis data which is computed by Hadoop but that data is also not that big.

My question is whether i should pick 2 database like Mysql/Mongo for storing small dataset and HBase for big dataset ?

Or my HBase can do both job efficiently ?

request you to go through my answer http://stackoverflow.com/questions/37781992/what-should-be-considered-before-choosing-hbase/37817519#37817519 — Ram Ghadiyaram, Jun 20 '16 at 13:14
Sir , Thanks for reply. Just reframing question : I am pretty much sure i will use Hadoop ( Map reduce , HDFS ) and Hbase for my computation problem. My question is can i also store my small dataset in Hbase like user setting and user info in hbase ? and it is right to do so ? — Pradeep Jaiswar, Jun 20 '16 at 18:47
i think you will have mysql as backing store if you have hive installed. why cant you consider that. I am assuming that you are looking up this static small data set from your hbase + mapreduce (big dataset) — Ram Ghadiyaram, Jun 21 '16 at 05:08
there is no restriction that small dataset cant be stored in hbase. But you cant able to join rows using joins and other sql features you will miss. — Ram Ghadiyaram, Jun 21 '16 at 05:10
I am assuming, you want to lookup static data(user setting, application user listing ,payment information ) either by hbase or rdbms from mapreduce mapper or reducer is nt it ? — Ram Ghadiyaram, Jun 21 '16 at 13:36
Yes , i want to but this small data set does not required any processing . It a a get query from database — Pradeep Jaiswar, Jun 22 '16 at 11:18
ya thats why I told it as static data which doesnt require any processing — Ram Ghadiyaram, Jun 22 '16 at 11:25
no issues you can do it from database well. for that you can use mappers setup method to open the connection and mappers cleanup method to close the connection. in this way you can make sure that connections wont get exhausted — Ram Ghadiyaram, Jun 22 '16 at 11:44

score 1 · Answer 1 · edited May 23 '17 at 12:32

My opinion you cant compare apple with banana. Hbase is schema less and From CAP theorem, CP is the main attention for hbase.

Where as CA is for RDBMS. please see my answer. RDBMS has these properties has schema , is centralized, supports joins, supports ACID, supports referrential integrity.

Where as Hbase is schema less , distributed, doesnt support joins ,no built-in support for ACID.

Now you can decide which one is for what based on your requirements.

Hope this helps!

Using HBase for small dataset and big data analysis at the same time?

1 Answers1