Using database with Hadoop cluster

Asked Apr 04 '18 at 20:14

Active Apr 04 '18 at 20:14

Viewed 41 times

Currently I have a small Hadoop cluster that performs a MapReduce task on my input data and generates some output. What I would like to do is store this data in a database so that it can be queried for analysis. I would like the database to simulate the ACID properties, so that any change in one node is reflected upon the entire cluster. Then if a node fails there will be others containing the current data.

I have been researching things like Hive with its ACID transactions, but is this all I would need to accomplish that?

asked Apr 04 '18 at 20:14

Nick_4810

Hive requires a metastore such as Mysql or Postgres to accomplish this. Which if you have these anyway, why not just use a clustered deployment of those? – OneCricketeer Apr 05 '18 at 00:20
Thanks that's what I was thinking. I am still new to this area so any help is appreciated. – Nick_4810 Apr 05 '18 at 04:40
I personally don't have much experience with Hive ACID transactions, but I find many people using Cassandra, Hbase, or Couchbase instead of Hive for simple updates like you're asking – OneCricketeer Apr 05 '18 at 04:46
Hive provides ACID transactions only for ORC format . You might consider looking into other databases as mentioned by @cricket 007 – Deepan Ram Apr 05 '18 at 10:32
btw, Hive isn't actually a database. It's only SQL engine on Hadoop – OneCricketeer Apr 05 '18 at 12:08
Yes I have looked into Hbase more and I think it will work for me. – Nick_4810 Apr 06 '18 at 06:05

Using database with Hadoop cluster

0 Answers0