Folks,
I am currently have started reading about NOSQL related DB as currently working on Database warehousing related application.
I have following questions. I have already read basics.
Question 1) How entire raw is retrived in column oriented database as data with same column is stored together ?
lets say we store data in following format so internally it will be stored like this in column oriented DB.
test|test1 together and 5|10 together.
key 1 : { name : test, value : 5 } key 2 : { name : test1 , value : 10 }
So if we have to retrive data for key1 how does it happen ? (A and B is my guess)
A) If it has to pick data from each column storage seperately then it will be very costly
B) is there any indexing mechanism to fetch this data for all columns for given raw key ?
Question 2 )
I was reading through some of the docs and found column oriented Database is more suited to run aggregation function on single column as I/O will be less.
I didnot find proper support for aggregation function like SUM,AVG etc in NOSQL column oriented store like cassandra and HBASE. ( There could be some tweaking/hacking/more code writing like below)
How does Apache Cassandra do aggregate operations? realtime querying/aggregating millions of records - hadoop? hbase? cassandra? How to use hbase coprocessor to implement groupby?
Question 3 ) How the joins happens internally in column oriented database is it advisable to do ?