What makes Cassandra (and NoSQL in general) a better solution to an RDBMS?

Question

Well, NoSQL is a buzzword right now so I've been looking into it. I'm yet to get my head around ColumnFamilies and SuperColumns, etc... But I have been looking at how the data is mapped.

After reading this article, and others, it seems the data is mapped in a JSON like format.

Users = {
    1: {
        username: "dave",
        password: "blahblah",
        dateReged: "1/1/1"
    },
    2: {
        username: "etc",
        password: "blahblah",
        dateReged: "2/1/1",
        comment: "this guy has a comment and dave doesns't"
    },
}

The RDBMS format would be:

Table name: "Users"

id | username | password | dateReged | comment
---+----------+----------+-----------+--------
 1 |  dave    | blahblah |  1/1/1    |
---+----------+----------+-----------+--------
 2 |  etc     | blahblah |  2/1/1    | this guy has a comment and dave doesn't

Assuming I understand this correctly and my above examples are right, why would I choose the RDBMS design over the NoSQL design? Personally, I'd much rather work with the JSON structure... Does this mean I should choose NoSQL over, say, MySQL?

I guess what I'm asking is "when should I choose NoSQL over RDBMS?"

On a side note, as I've said, I'm still not fully understanding how to go about implementing a Cassandra database. Ie, how do I create the above Users table in a new database? Any tutorials, documentation, etc you could point to would be great. My google'ing hasn't turned up much in terms of 'starting from scratch'...

I saw that link earlier today. I'll be sure to watch it when I get home ;) — dave, Sep 09 '10 at 01:19
possible duplicate of [Why nosql with cassandra instead of mysql ?](http://stackoverflow.com/questions/3640899/why-nosql-with-cassandra-instead-of-mysql) — Thilo, Sep 09 '10 at 01:25
Well, NoSQL is obviously not a guaranteed win in all cases - http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/ — Justin, Sep 09 '10 at 04:35
Possible duplicate of [What is NoSQL, how does it work, and what benefits does it provide?](http://stackoverflow.com/questions/1145726/what-is-nosql-how-does-it-work-and-what-benefits-does-it-provide) — Trevor Boyd Smith, Feb 20 '17 at 15:43

score 18 · Answer 1 · answered Sep 09 '10 at 05:20

If you are google, then you might be in a position where a NoSQL would be easier on you than a RDBMS. Since you are not, the many advantages an RDBMS provides you will probably be of some use. Significantly, on a single node, NoSQL offers absolutely no advantages over RDBMSes. RDBMSes offer lots of advantages over NoSQL, though. what are they?

RDBMSes use some pretty deep magic to understand the data it owns, and the data you are asking for, in such a way that it can return that data in the most efficient manner possible. If you didn't ask about some column, the rdbms doesn't waste any effort retrieving it. If you are interested in rows that have fields in common across two tables, (this is a join, btw), the RDBMS doesn't have to check every single pair of rows for matches, or what a NoSQL db usually does is just give you everything and make you do the checking. with a RDBMS, you can usually construct queries that are actually 'about' the data you are using, like "if the date is a tuesday", and if your indexes support it (if you do that query alot then you would add such an index) you can get those rows efficiently.

There is another reason why RDBMSes are nice. Transactions are easy on RDBMSes, but are much harder to get right on NoSQL databases. Supposing you are implementing a blogging engine. Suppose the post title (which appears in the URL) needs to be unique across all posts. In an RDBMS, you can easily be sure that you won't get this wrong accidentally. With a NoSQL database, if it does support some kind of transactional integrity, it's usually at the shard level, anything that could possibly require that kind of integrity must be on the same shard. since any pair of users could possibly be posting at the same moment, then every users' post must be on the same shard to get the same effect. Well, then you don't get any benefit at all from NoSQL.

'Significantly, on a single node, NoSQL offers absolutely no advantages over RDBMSes. RDBMSes offer lots of advantages over NoSQL, though. what are they?' - erm No. One example: write times to MongoDB are significantly faster than write times to MS SQL server. Its a bit misleading to stipulate there are NO advantages. It may not be the right fit for the purpose, but if you're after speed, there is an advantage there. — Michael Shimmins, Sep 09 '10 at 05:25
MongoDB is schemaless, this is also a big difference on a single node. — TTT, Sep 09 '10 at 20:33
Yes, schemaless is different. The question is really about why would this be a good thing? I'm a bit suspicious of a schemaless setup. In theory, it makes changes easier. At the level of the database, it certainly does, you don't have to go to any length to add or remove properties at that level. On the other hand, it doesn't in any way make the semantic consequences of database migration any easier. What is the correct behavior when processing the fields that may be null? schemaless doesn't alleviate that in the slightest. — SingleNegationElimination, Sep 12 '10 at 01:45

Thomas Mueller · Answer 2 · 2010-09-09T08:36:33.453

16

The main advantage of NoSQL is horizontal scalability and distributed storage. That means you can have a large number of 'cluster nodes' and write to them in parallel. The cluster will ensure changes are propagated to the other cluster nodes eventually (eventual consistency).

NoSQL is not so much about SQL (the term means "not only SQL"). In fact, some NoSQL products do support a subset of SQL. The reason the data format is different (JSON or list of property / value pairs versus tabular data) is: within relational databases, the number of columns (and column names) is defined in a central place, which doesn't work well with horizontal scalability (you would need to stop all cluster nodes for schema changes). Also, joins are not supported as much because that would break horizontal scalability (data from multiple cluster nodes may need to be read, if the data is distributed).

edited Sep 09 '10 at 08:36

answered Sep 09 '10 at 04:29

Thomas Mueller

48,905
14
116
132

6

And Oracle, DB2 , SqlServer, Teradata etc. dont support clustering ?? Well not before 1992 anyway. – James Anderson Sep 09 '10 at 06:54
5

They do support clustering, but they don't support horizontal scalability as well, because they try to support all the ACID properties. The NoSQL products don't try to support all ACID features. Some say NoSQL really means NoACID: http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html – Thomas Mueller Sep 09 '10 at 08:33
@Thomas Mueller: Which is exactly why so many people say NoSQL is BAD. And it doesn't support joins either, and therewith forces denormalization, and therewith creates redundancy which ("almost" necessarely) leads to data consistency problems. Plus eventual consistency is bad. If the server crashes, it should have commited all data to disk, when it said it did. When it eventually commits the data to disk (but says it's commited before), then bad things are gonna happen ... – Stefan Steiger Apr 01 '16 at 08:48
@StefanSteiger yes. Sometimes those bad things are not perceived to be all that bad, at least at the beginning, and horizontal scalability is perceived to be more important. And sometimes people just like to have NoSQL in their resume because it's cool :-) – Thomas Mueller Apr 01 '16 at 09:53

HLGEM · Answer 3 · 2012-03-08T20:40:45.150

NoSQl databases are fine for some websites where you don't need transaction or consistency where all you are doing is presenting some data (but until you get really really large, they are not really very needed).

But if you need to enforce financial rules (or other complex data integrity rules) or internal controls or reporting and aggregating data for reporting, you need an RDBMS. I'll bet even Google uses RDBMS' for their own HR and financial data, etc.

For some web applications, you might even want a combination of both, the nosql database for some types of information, the transactional relational database for orders and other things where transactional consistency is a must.

If you develop web sites, I think you need to thoroughly understand both types of databases and the needs behind them before choosing how to handle any new functionality.

It seems to me that you have almost no knowledge of relational databases and would rather do what is easier for you personally than what is right for the project. Maybe I'm not reading that correctly, but anyone who never uses joins is suspect in terms of understanding relational databases.

You don't decide between these two based on which one seems easier to understand or which is the buzzword of the month, you decide them based on the functionality you will need, not just for the user interface but for administrative tasks, reporting, financial or other types of data auditing, government regulation, data recovery in case of a hardware failure, etc.

score 5 · Answer 4 · answered Sep 09 '10 at 05:04

RDBMS' are all about consistency. They do a great job on data that gets churned alot with transactions. See also ACID (atomicity, consistency, isolation, durability). Sometimes you don't need all that, like when storing data from logs or working on data that's not going to change, just accumulate.

NoSQL databases let you relax the requirements for transactions and get better performance (as well as scale to large distributed storage silos easier).

score 4 · Answer 5 · edited Nov 23 '14 at 11:59

4

Answer is easy. If you need data storage - use NoSQL, if you need more features then just storing data - use RDBMS.

edited Nov 23 '14 at 11:59

Joshua Moore

24,706
6
50
73

answered Nov 23 '14 at 11:49

Tommix

443
4
15

score 4 · Answer 6 · answered Sep 09 '10 at 01:25

The advantage fo NoSql is that its simpler and if you have your OO blinkers on it fullfills all your persistence needs.

The advantage of SQL based realtional database is that you can easily re-use and extend your data in ways that were not envisaged in the original design. Also "Object" databases tend to perform very badly (even if its possable) when you want to do the equivalent of SQLs aggregate queries like COUNT, SUM, AVG.

Googles BIGTABLE which is the biggest OO database anywhere (and probably the biggest database period) also supports SQL and sql features like indexing and strong typing.

score 3 · Answer 7 · answered Sep 08 '14 at 16:58

As many books about NoSQL mention, it's not about which database is better than the other. It's more what you need.

As everyone say in the other answers, many NoSQL databases support horizontal scalability and are focused on high availability but they are not always the best fit for your needs.

for example, Cassandra is great to add or remove nodes from a cluster, allowing that high scalability. But when you compare Cassandra with MySQL in an environment with just one node (one server), and with no distributed architecture, there isn't a lot of different, since the main advantages of Cassandra are not used.

Now, why should you use SQL? The most common reason is transaction management. Currently, no popular NoSQL database natively supports transactions. You can emulate them, but they are not part of the native functionality as in most SQL databases.

For Cassandra, there is a full and free training in https://academy.datastax.com

There you won't only find trainings to install and configure Cassandra, but to use its tools. It even gives you completion certificates.

Datastax has its own distribution of Cassandra, but it follows all the same guidelines as the Apache project; it offers some extra tools.

score 3 · Answer 8 · answered Sep 09 '10 at 01:20

3

I guess what I'm asking is "when should I choose NoSQL over RDBMS?"

[Caveat: I've never read about NoSQL before]

According to Wikipedia, NoSQL isn't good at joins: which implies (to me) no referential integrity and no normalization.

answered Sep 09 '10 at 01:20

ChrisW

54,973
13
116
224

To be honest, my knowledge of SQL is pretty poor. I think I've used the JOIN keyword once. Only once. Losing that wouldn't really affect me. – dave Sep 09 '10 at 01:22
@dave: If you don't understand SQL (or, more importantly, its foundations in relational algebra) then obviously the SQL and NoSQL solutions will seem very similar. The differences don't really begin to manifest until you have a *lot* of data (and/or a *lot* of transactions). – Daniel Pryden Sep 09 '10 at 01:35
@dave Joins are associated with [database normalization](http://en.wikipedia.org/wiki/Database_normalization): the pictures in the right-hand margin of that article are a quick introduction. – ChrisW Sep 09 '10 at 01:37

score 1 · Answer 9 · answered Jan 15 '14 at 03:14

Cassandra in and of itself is not better than an RDBMS. It is better under some circumstances. An RDBMS is vastly superior for transaction processing, master data management, reference data, data warehousing and (some forms of) BI.

Use NOSQL if your application requires a flexible schema, variable length rows, variable types of columns, eventual integrity, horizonal scalability on commodity servers, and high availability achieved by means of a distributed architecture.

NOSQL does not do joins for several reasons: you already joined the data before the NOSQL file was loaded so there is no need to; because a distributed join over far-reaching servers would be resource intensive. The first reason above is simple: you have embedded all the data you need into a single structure. If you do not embed the data and have to link, don't expect great performance out of it. Linking is a euphemism for application-provided joining without the benefit of consolidating the data as a join does. Assuming hashing a key is the method of data distribution, different records that have the same hash key would be collocated. Thereby if joining were permitted, the joined data would all be on the same server.

It's not just black and white.

score 1 · Answer 10 · answered Sep 09 '10 at 01:46

1

The simplest answer I can think of is: When your data doesn't fit a relational model.

answered Sep 09 '10 at 01:46

T3hc13h

602
6
15

4

I have seen several things which do not fit into an OO model, never yet seen anything that could not be modeled in a relational DB. – James Anderson Sep 09 '10 at 04:14
1

@James Anderson [Hierarchies (trees)](http://stackoverflow.com/questions/1085287) can be modeled in a relational DB, but it's a bit difficult/special to do that. – ChrisW Sep 09 '10 at 08:53
2

You certainly CAN model just about anything in a relational db, but in many cases you really have to contort your data. – Nils Weinander Sep 09 '10 at 09:07
@JamesAnderson, one thing that is difficult to model in a relational database is when you have a different number of columns for each data point such as for medical tests. You can use an EAV table for that, but if you do then you would likely be better off using a NoSQL database to do the same thing faster. – HLGEM Apr 30 '15 at 14:08

score 1 · Answer 11 · answered Sep 09 '10 at 17:21

1

I gave a talk at OSCON about when NoSQL can be the right choice, and some of the different sub-categories to be aware of: http://assets.en.oreilly.com/1/event/45/The%20NoSQL%20Ecosystem%20Presentation.pdf

answered Sep 09 '10 at 17:21

jbellis

19,347
2
38
47

@jbelis: "Relational databases don't scale", "Relational databases are slow" These are claims that could apply to certain DBMS products. They have nothing to do with the relational model. It would be quite reasonable to make a "NOSQL RDBMS" (ie. relational, not SQL) that didn't have the same perceived disadvantages. As I have often observed, NOSQL enthusiasts sometimes seem overly keen to throw out the relational baby with the SQL bathwater :) – nvogel Sep 10 '10 at 18:53
2

Odd how there are relational databases with trillions of records yet people still claim they don't scale. They only don't scale when you are incompetent at database design. – HLGEM Jul 27 '11 at 14:26

What makes Cassandra (and NoSQL in general) a better solution to an RDBMS?

11 Answers11

Linked