In relational databases, we decide ordering when we write the fetch queries. But for Cassandra we have to do it when we are creating tables.
What's the reason behind this difference?
In relational databases, we decide ordering when we write the fetch queries. But for Cassandra we have to do it when we are creating tables.
What's the reason behind this difference?
Ordering or sorting data is expensive in terms of time and space. It is necessary to process entire data set to decide the order and the sorting cannot be fully done in distributed fashion. The best algorithms have complexity O(n * log n). In practice the quick sort is often used when data fit main memory (doesn't require to store intermediate data on disk or move from another node) and its complexity is O(n * n), but in usual cases performs better than merge-sort or other O(n * log n) algorithms.
RBDMS are usually not distributed, so the performance is hit by disk IO if data do not fit main memory. In the case of distributed databases and distirbuted data, it is necessary to move data between nodes, which, in general, can be very expensive.
It is not uncommon that queries take considerable time in RDBMSs. Thus tools are provided to investigate query plans, so the queries can tweaked or necessary indexes added. In the worst case it requires to materialize results of the query, change the schema, or gave up and move to another DBMSs, which designed for analytical processing.
Cassandra has chosen different approach: it focuses on performance and doesn't support operations, which are expensive. Instead it requires users to think about data usage and future queries in advance and design the schema according to future usage. To get an ordered result, it is necessary to include desired columns into the clustering key. However, the order will be mainted per partition and not across partitions. This is due to the same reason that to decide global order for new record might require to see data from other nodes.
The limited query support allows to provide performacne guarantees in Cassandra.