0

Folks,

I am currently have started reading about NOSQL related DB as currently working on Database warehousing related application.

I have following questions. I have already read basics.

Question 1) How entire raw is retrived in column oriented database as data with same column is stored together ?

lets say we store data in following format so internally it will be stored like this in column oriented DB.

test|test1 together and 5|10 together.

key 1 : { name : test, value : 5 } key 2 : { name : test1 , value : 10 }

So if we have to retrive data for key1 how does it happen ? (A and B is my guess)

A) If it has to pick data from each column storage seperately then it will be very costly

B) is there any indexing mechanism to fetch this data for all columns for given raw key ?

Question 2 )

I was reading through some of the docs and found column oriented Database is more suited to run aggregation function on single column as I/O will be less.

I didnot find proper support for aggregation function like SUM,AVG etc in NOSQL column oriented store like cassandra and HBASE. ( There could be some tweaking/hacking/more code writing like below)

How does Apache Cassandra do aggregate operations? realtime querying/aggregating millions of records - hadoop? hbase? cassandra? How to use hbase coprocessor to implement groupby?

Question 3 ) How the joins happens internally in column oriented database is it advisable to do ?

Community
  • 1
  • 1
user1927808
  • 577
  • 1
  • 10
  • 23

1 Answers1

0

Nice question, 1) In Cassandra if you are using cqlsh then it will look like as you store data in mysql or some other rdbms stores.

Connected to Test Cluster at localhost:9160.
[cqlsh 3.1.7 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh> create keyspace test with replication={'class':'SimpleStrategy', 'replication_factor': 1
         <value>  
cqlsh> create keyspace test with replication={'class':'SimpleStrategy', replication_factor': 1};
cqlsh> USE test ;
cqlsh:test> create table entry(key text PRIMARY KEY, name text, value int );
cqlsh:test> INSERT INTO entry (key, name , value ) VALUES ( 'key1', 'test',5);
cqlsh:test> INSERT INTO entry (key, name , value ) VALUES ( 'key2', 'test1',10);
cqlsh:test> select * from entry;

 key  | name  | value
------+-------+-------
 key1 |  test |     5
 key2 | test1 |    10

cqlsh:test>

Note:- you can select rows using key or using some criteria on other column by using secondary indexes.

But in hbase the structure will look like following

rowkey | column family | column | value
key1   | entry         | name   | test
key1   | entry         | value  | 5
key2   | entry         | name   | test1
key2   | entry         | value  | 10

Note:- you can select each row using key or any column value its very easy.

2) Yes nosqls also supports batch operation only for DMLs.

3) Joins are not supported in none of nosqls datastores. They are not meant for joins.

Hope it will help you.

kkmishra
  • 699
  • 4
  • 10
  • Dear kkmishra, I would like to understand below on your note.Basically I am trying to understand internal of NOSQL to decide whether to use it for our new application or not. Note:-you can select rows using key or using some criteria on other column by using secondary indexes. – user1927808 Jun 09 '14 at 10:29
  • 1) How single raw is retrived in NOSQL ( I believe it has to check different disk blocks to retrive singleraw and it is not continuous.) is there any primary index like columnArray['rawKey'] = columnValue 2) How values are retrived using column condition I mean how secondary index is maintained ? can you give sample like I gave in the first. – user1927808 Jun 09 '14 at 10:36
  • 1) A single row key data resides on a single node, that is in case of fetch by single row key no need to read data from multiple nodes it will come from single node. 2) In case of secondary table Each node only indexes data that it holds locally, so in this case read will occur on single node as well. – kkmishra Jun 09 '14 at 17:16
  • Thanks for your response but apparantely my origional question was why aggregation functions are not suppored in CASSANDRA, HBASE etc. In my view aggregation function should do less I/O in NOSQL db. refer blog http://stackoverflow.com/questions/23532128/aggregation-queries-in-cassandra-cql – user1927808 Jun 11 '14 at 09:33
  • One more .http://stackoverflow.com/questions/3052952/how-does-apache-cassandra-do-aggregate-operations – user1927808 Jun 11 '14 at 09:35