20

After developing software for about 5 years now, I have spent probably atleast 20% and perhaps up to 40% of that time simply making a RDBMS able to save and retrieve complex object graphs. Many times this resulted in less than optimal coding solutions in order to make something easier to do from the database side. This eventually ended after a very significant amount of time spent in learning NHibernate and the session management patterns that are part of it. With NHibernate I was able to finally eschew the large majority of 100% wasted time of writing CRUD for the 1000th time and use forward generation of my database from my domain model.

Yet all of this work still results in a flawed model where my database is merely the best attempt by SQL to imitate my actual object. With document databases this is no longer the case as the object becomes the document itself instead of merely emulating the object through tables and columns.

At this point I'm really starting to question why would I ever need SQL again?

What can really be done substantially better with SQL than a document database?

I know this is somewhat of leading into a apples to oranges comparison especially when you factor in the various types of NoSQL databases having widely different feature-sets but for the sake of this argument base it on the notion of NoSQL databases can inherently query objects correctly and not on the limitations of a key value store. Also leave out the reporting aspect as that should generally be handled in a OLAP database unless your answer includes a specific reason you would not use a OLAP database for it.

Chris Marisic
  • 32,487
  • 24
  • 164
  • 258
  • 3
    You can't really ignore the reporting aspect of RDBMS'. Agreed that reporting isn't needed for most cases, but where it is needed, those joins can be quite handy. – Anurag Jul 20 '10 at 22:11
  • 2
    Hmmm, finger on the button when it becomes to argumentative _again_. – Wrikken Jul 20 '10 at 22:13
  • 1
    SQL is usually far superior in finding relations between data, statistical analysis and in safe transactions, and has a lot less duplication of data. Some reading: http://www.cattell.net/datastores/index.html SQL and NoSQL both have their uses & places, anyone trying to use one tool for all problems either has a very limited scope of problems, or a hard time hammering a screw in. – Wrikken Jul 20 '10 at 22:15
  • @Anurag I strongly agree with you that reporting is an important function, the optimal solution for reporting is to use an OLAP database regardless of the fact many use their normal database for reporting also. This is why I specifically brought up the reason to exclude this argument, because reporting off a transactional database is already *almost* wrong to start with. – Chris Marisic Jul 20 '10 at 22:25
  • 3
    -1 Doesn't really seem to ask a question. Title makes an unsupported provocative claim ("NoSQL is prevalent" which from the SO perspective it isn't) Selects own zero vote answer. – Conrad Frix Aug 10 '10 at 18:40
  • 2
    @Chris. No its not a question, its a declaration about your state of mind. The question mark is probably incorrect unless you're asking if you are, or are not "starting to question sql". Later you do ask subjective questions using statements like "outshine" and "substantially". IMHO it seems like you want the SO community to validate what you already believe. It seems like a blog would be a better venue for your thoughts, like this. http://blogs.computerworld.com/15510/the_end_of_sql_and_relational_databases_part_1_of_3 – Conrad Frix Aug 11 '10 at 15:48
  • @ChrisMarisic many reasons were provided, you are obviously not interested in those as they completely demolish your belief that NoSQL solutions can even be a replacement for a good SQL RDBMS (oracle, postgreSQL). This question, and your comments, should be removed from SO as they are without value to the community. – Morg. Oct 26 '13 at 12:09

10 Answers10

31

At Amazon I worked with a lot of code. Most of the code I worked one was code nobody really understood anymore. It was riddled with special case handling that wasn't well understood because it was an accretion of quick patches over a long period of time. If you wanted to fully understand the effect of a change you were making you were out-of-luck. In essence, you were forced to add to the accretion.

I also worked with a lot of data. The structure of the tables in SQL made excellent long-term documentation for the data. The database was relatively easy to work with directly, and the structure of the data made sense. There were people who's job it was to manage the structure and integrity of the data.

I fear that a NoSQL database, with its lack of well-documented structure, would slowly acquire all the evil qualities of the code I worked on. It would end up filled with data from old structures that nobody really understood anymore, and become a vast patchwork of mostly useless garbage.

I see the main benefits of SQL databases as the forced documentation that maintaining the database structure and consistency rules requires. Those benefits do not have an easy short-term measure like speed of a query or transactional consistency. They are long-term benefits that affect the usefulness of your data over an extended period of time.

As a second, related point, I find it more useful, when using ORMs and the like, to map out my data and then decide how that will translate into objects in the application I'm writing. The data and its relationships represent a long-term archival structure that may be used for a variety of purposes.

The structure of the object relationships in the application are there for the purposes of that application. A given set of data represented in SQL tables and relationship constraints will have many possible object models that represent it in an application, and each of those object models will reflect the goals of that particular application. But the data and its structure exist independently of any given ephemeral use that might be made of them.

I see the arguments people make about 'reporting' as being arguments that different applications can usefully view the same set of data in very different ways.

Personally, I think SQL is a good model to use directly for archival data, infrequently modified data, or data with extremely high consistency requirements. And I think that I will continue to use relational algebra to define the overall structure of my data even if I'm storing it in a NoSQL database. And I will not change the structure of the data in the NoSQL database without first modifying the relational structure describing it. This will allow me to map my NoSQL databases back to SQL so I can still use SQL for long-term storage and warehousing and force me to maintain the data structures in a well documented form.

Doing things this way will also assist me when I have to pull data out of the NoSQL database for use in applications that were not envisioned when the database was created.

Of course, there is some data who's structure naturally fits NoSQL and where generating a relational schema for it would be pointless. For example, storage of actual documents, storage of pictures or other media, or other large blobs of data that has no structure that might be useful to represent. This distinction is very tricky though. Pictures and movies do have structure to them, just not generally structure you need to store in a database. A blog post may have structure as well if you have a system designed to try to read and understand it, and that may well be structure you want to maintain a record of.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • "And I will not change the structure of the data in the NoSQL database without first modifying the relational structure describing it. This will allow me to map my NoSQL databases back to SQL so I can still use SQL for long-term storage and warehousing." This seems like a nearly inordinate amount of work, wouldn't this effort better spent just building and maintaining a proper import process for the datastore -> datawarehouse instead? – Chris Marisic Aug 06 '10 at 14:25
  • 3
    @Chris Marisic: And what happens when the importer program becomes something that someone needs to spend a few weeks to understand? No, IMHO, it's vitally important that you always have a really good handle on exactly what data you have in your database, what it means, and how it relates to the other data. Keeping an SQL schema (or any schema really) of it outside the database is a means for achieving that. – Omnifarious Aug 06 '10 at 15:00
  • I awarded you the bounty because using a database schema as a rigid model of your domain while might not be something I'd do definitely offers a case where Sql is substantially better suited for. – Chris Marisic Aug 10 '10 at 12:53
29

Relational data modeling is a formal, mathematical solution for representing complex data without redundancy and without allowing anomalies. You can design an optimal database design from the data relationships themselves. This is the process of relational database normalization.

Non-relational data modeling has no formal way to define the best database structure from the data. You can design a database based on your anticipated usage; that is, your queries determine the best data organization, not the data itself.

In non-relational databases, you can never be sure that data conforms a certain document structure. You could have documents left over in the database from an earlier revision. So your application code had better be able to "discover" the structure of each document, perform conversions if necessary, and hope that references between data collections are satisfied.

In relational databases, you can depend on data integrity being an integral part of the model. If you design for normalization and you set up constraints properly, you know you'll never have orphans or data anomalies.

Non-relational databases give you one type of efficiency, as you're designing the database. Relational databases give you another type of efficiency, as you're using the database.

That said, the specific type of problem you've been working with -- object graphs -- is tricky to accomplish efficiently with plain SQL. But I think you'll find it's not much easier with NoSQL databases.


Re your comment: Granted, consistency is not a priority for every app. That doesn't make the value of consistency "insubstantial" for the apps where it is important.

You asked about why you would use relational databases -- you'd use them when the benefits of relational databases fit the priorities of your project.

Don't drive a nail with a screwdriver, and don't turn a screw with a hammer. There's an appropriate tool to solve each type of problem.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    I find some this argument's primary points to be somewhat unsubstantial, the notion of orphaned data can be handled just as correctly through your application as the database itself. Besides orphaned data being allowed/not allowed is more of a business decision to start with. The argument of versioning documents just as clearly correlates to the notion of versioning database schemas. I don't see how these 2 factors are in anyway that different between each other. – Chris Marisic Jul 20 '10 at 22:21
  • 2
    There are document stores that meet the ACID definition. – Chris Marisic Jul 20 '10 at 22:36
  • 2
    There's plenty of data patterns and consistency criteria not expressible in relational model. Like anything involving transitive closures (every node can be reached from root node) - quite ironic for something that calls itself "relational". Things that can be cleanly modeled relationally are rare outside textbooks. – taw Aug 05 '10 at 05:32
  • I'd really like to know why people are upvoting this answer. – Chris Marisic Aug 05 '10 at 13:57
  • 10
    @Chris Marisic: because it describes the differences between RDBMS and NoSQL. Each is a tool with pros and cons: neither is the end in itself. If you didn't want a balanced answers or someone to defend RDBMS then you should have stated you only want answers that agree with you. Or not asked. – gbn Aug 08 '10 at 12:06
  • @gbn IMO this post seems to be made with inaccurate assumptions about NoSQL databases, most specifically that it is inherently not capable of supporting ACID fully. – Chris Marisic Aug 09 '10 at 12:20
  • @Chris Marisic: I'm interested if you have an example of a non-relational database that supports consistency. Specifically, constraints for referential integrity. I'm not denying there may be one such implementation, but it is not typical among databases using the NoSQL brand. – Bill Karwin Aug 10 '10 at 15:30
  • @Chris Marisic: You keep mentioning ACID, but I'm not talking about ACID. I'm talking about consistency. Perhaps I should have used the word *integrity*. I'm talking about data modeling, not transactions. I don't believe any schemaless database can enforce data integrity. – Bill Karwin Aug 11 '10 at 06:24
  • Yes you can achieve referential integrity and store references to other documents inside a document. However I assume you could probably physically delete documents that would cause data to be disjoint, but this goes back to my earlier statement this is a business decision on whether this is wrong or not. If it's wrong you make your application not allow it and don't do it as an admin. – Chris Marisic Aug 11 '10 at 12:25
  • @Chris Marisic: Ah, the "do it in the app" argument. My point is that when you can define RI constraints in a relational database, you don't have to worry about whether your documents contain broken references. Performing that enforcement in the app is insufficient, because apps can have bugs. Maybe another developer is writing his own app to access your same data, and he doesn't know which constraints to implement. So you can never be sure your data has integrity unless the database is in charge of enforcement. And that also means you need to define a schema. – Bill Karwin Aug 11 '10 at 15:37
  • "So you can never be sure your data has integrity unless the database is in charge of enforcement." Is bullshit. Not to mention as a developer I could easily execute sql to drop the constraints and do what I wish anyway. Intentionally altering data is not an error, it's a business action whether a correct or incorrect one. Not to mention when the database dictates its "constraints" you end up with inserts that fail you have 100% data loss. I'd much rather have a partial document instead of 100% data loss. – Chris Marisic Aug 12 '10 at 13:28
  • I would find your argument to be the straw man argument. You're stating an application isn't capable of enforcing referential integrity, what else is a database other than an application? – Chris Marisic Aug 12 '10 at 16:43
  • 1
    @Chris Marisic: I did not say an application isn't capable of enforcing RI. Of course your apps are bug-free and flawless. But what about the schmuck in the next cubicle? If he can't program *his* app to do the correct RI enforcement, you have a problem. That's when it's advantageous to have a database schema that blocks invalid changes. Remember, you asked for a case where an RDBMS has an advantage over NoSQL/schemaless. – Bill Karwin Aug 12 '10 at 22:35
  • @ChrisMarisic RavenDB is not ACID. Besides, what consistency is there to speak of when you have no schema and no enforced types. You are naive in thinking that it's possible to write an ACID compliant application layer on top of your NoSQL database, AND be faster/ simpler than existing RDBMS. – Morg. Oct 26 '13 at 12:12
  • @Morg. thanks, do you have a reference that shows how RavenDB is not ACID? Because they claim all over the place that they support ACID transactions. But of course they would claim that. :-) I haven't used it, so I'm interested in getting an independent review. – Bill Karwin Oct 26 '13 at 14:31
  • @Morg. you're flatly wrong and that's all there is to it. There is consistency, you must persist valid json. RavenDB will not persist invalid json. The fact that the schema is defined by the data itself and is not a rigid schema such as RDMBS frequently uses does not mean there is no consistency. The rules are just different. Note there are similarly triggers in RavenDB, such as UniqueConstraints bundle that allows you to require certain fields be unique the same as you can define uniqueness in RDMBS. – Chris Marisic Oct 28 '13 at 17:07
  • No schema means no data consistency. There may be an encoding consistency, or whatever, but your data is not consistent and cannot be processed 100% of the time by a single processor. Triggers are not what you think obviously, and you should just stop pretending you know anything about databases right now, this is embarassing. – Morg. Oct 29 '13 at 09:37
  • @BillKarwin I thought it was rather obvious. Atomicity they may have, Isolation maybe, Durability maybe. Consistency ? there is no consistency in a document database so that must be bullshit. On top of that they insist that they have no locks, and you can't have ACID transactions without locks, from a purely logical standpoint. It uses async replication, and thus cannot be ACID as a cluster either. It's also very obviously a naively designed database, and I would be surprised if its authors even understand what ACID means. – Morg. Oct 29 '13 at 09:46
  • @Morg, right, it depends on how one defines consistency in the context of ACID. It's supposed to mean that any database constraints are satisfied before and after every transaction. But if there really aren't any database constraints (except for valid JSON as Chris mentions), then it's kind of a degenerate case, like in geometry when they say a point is a degenerate case of a circle. – Bill Karwin Oct 29 '13 at 15:21
  • Exactly. I don't believe it makes any sense to consider degenerate cases when talking about ACID, which was defined to talk about strong consistency and fixed schema databases. Otherwise you can just start saying anything that has no consistency to speak of will always respect ACID-C, not much value there. It's like saying you respect ACID-I in a context without any notion of concurrency, meaningless. – Morg. Oct 29 '13 at 16:59
5

it depends on what you are trying to do. when you need to do searching on different fields of your objects then SQL is good. if you don't need to do searching and you have very complex polymorphic tree like structures then SQL is horrible.

i've worked on app that allowed users to build web pages by joining little fragments together and the original serialization used key/value SQL tables. all the fragments had properties which were stored (fragment, property, value). so schemaless but still a lot of heavy lifting. probably the worst of both worlds because you don't really get much data validation from the database, it is very difficult to look at the tables and understand what is going on and there is still a lot of work to write it to the db and read it back.

we've also done a similar app but we learnt our lesson and we just take plain java classes and encode them using JSON. the user just edits their page in the front in a rich ui. clicks save and the whole page is sent back to the server as a json object. the server then does validation on the object to make sure all the constraints are correct which should always be true unless a user has been tampering or there is a bug in the code. then the object is written to a row by encoding to back to json.

this works well for us because we never want to deal with part of the object. we always deal with the whole of the object so JSON is not only easier but it is faster than doing the 40+ queries on each read we would have to do if it was properly normalized.

benmmurphy
  • 2,503
  • 1
  • 20
  • 30
1

NO-SQl Means - Non Relational Database!

NoSQL databases are better for large applications where scalability is important.

Nosql Database are Insertion and fetching both are fast

NoSQL databases are Fetching record is 10X faster in compare to sql database

calable and flexible datastore : This is the primary reason for moving away from relational database.

It can work on Structured and Unstructured Data. It uses Collections instead of Tables

Keshav Gera
  • 10,807
  • 1
  • 75
  • 53
0

Tooling is much better for SQL. NoSql has a buggy reputation. But even assuming those two differences even out...

I have the opposite experience from you in modeling complex objects in SQL. To say that tables and columns are at best an 'emulation' of your objects, that's a bit semantic. Any serialization of your objects would also be an emulation: While a document database or xml or whatever may feel like a better emulation than tables/columns, it tends to be less powerful technology. ORMs have helped immensely to bridge the gap from RBDMS to object oriented languages.

Since relational theory was formalized, SQL has been king. Hierarchical dbs (which document databases are) lost, relational dbs won. I would ask yourself, given that history, is your problem all that different from the majority of problems over the last 30 years that you need to revert to hierarchical form?

NoSql dbs are hip now for problems that require horizontal scaling (which SQL doesn't do well now). Does your problem require that?

Shlomo
  • 14,102
  • 3
  • 28
  • 43
  • I would disagree that serialization is an emulation of objects as it is in truest form a physical representation of the objects. Whereas using tables/columns is an emulation of storing the data in a way similar to serialization but is not serialization. In regards to the question about the last 30 years about SQL being king is in no way a validation of it being objectively better than other databases. In 30 years programming has become fundamentally object orientated which even in its early days was the reason hierarchical databases were created. – Chris Marisic Aug 10 '10 at 13:12
0

There are variants of Not only SQL databases outhere each has its own pros and cons.

there are document or object based, column based (wide row), key value based and graph based, and thats only what i can think of right now. Each of those types of database has its weaknesses and its strongnesses (compared to others and to RDBMS).

The real question that you need to ask yourself when making a decision for which DB type to choose is how you are going to use the data?

in most common cases, at least untill some level of object complexity, and for non-huge data, RDBMS care less about how the data is used and more about the data itself. In RDBMS you just need to know your data structure and internal relationships and after you realize that, you just put it in a normal form schema and if you put the right keys and indexes you get kicking performance on most queries.

In a NoSQL database its more crucial, for example a specicific weakness of the document based DBs is that if you need to make complex queries regarding multiple documents in most cases you will not get better performance than you would from a RDBMS.

for example, if you are maintaining Order documents, and wish to query the order with the maximal profit that was taken in a range of dates, afaik if you are not an expert (as i am no such) you will end up having a O(n) query, while in RDBMS it will take less and most certainly be more performant even if you are a MongoDB expert.

In conclusion, if you know in advance how your data is going to be used, and you know a document db would be performant for your use case then yeah, take that document DB, but if you are not sure how your data will be used, than RDBMS would generally be wiser decision.

And of-course there is the BigData argumant you need to take in account, as RDBMSs dont scale out (cant easily add nodes to support more traffic), and gets less performant when it deals with HUGE data (may start to lag in GBs or PBs).

Also, keep in mind that RDBMSs are way older and had been developed extensively over the years than document DBs which makes RDBMS contain more optimizations and tools than any of the NoSQL alternatives.

Ofek Ron
  • 8,354
  • 13
  • 55
  • 103
-1

It is important to remember that relational is still (and will continue to be for some time) the platform of choice for: transaction processing, master data management, reference data, data warehousing (in MPP), BI (though inverted column database are outstanding at query performance). Given the current state of NOSQL, it is nearly absurd that it can replace relational for the above uses.

TomFH
  • 55
  • 3
  • I specifically separated the role of OLTP vs OLAP, alot of your list directly falls into OLAP. I vehemently disagree with your usage of "transaction processing", unless you mean something different than OLTP because NoSQL is built entirely for OLTP. – Chris Marisic Jan 17 '14 at 14:24
  • I've been out for a year. Want to tidy up my comment on “transaction processing”. My definitions: Transaction: coherent unit of work making a change in a DB. Two types: 1. Simple: atomic operations on a single row, such as R/T data entry/ retrieval. Limited ACID. Ex: send a song/pix; post a social message. Low risk. 2. Critical: interaction of multiple objects. Full ACID-compliance. Multiple row updates into one all-or-nothing operation. Roll back. E.g.: place stock trade; submit insurance claim; transfer money. Involve risk. I meant most NOSQLs support only simple transactions, as above. – TomFH Aug 04 '15 at 17:07
  • There are NoSQL systems that support transactions spanning multiple objects like that. Also note that a non-relational system design wouldn't necessary emulate the way it would be done in a RDBMS. In general a well designed non-relational system should involve a much smaller number of objects needed to complete any transaction as the objects would likely be larger than a row in a RDBMS. – Chris Marisic Aug 04 '15 at 19:02
  • I don't know of any NoSQL systems that support transactions spanning multiple objects, but I would like to. Can you direct me to any because I would like to include them in a report I am writing for my customer? Thanks – TomFH Aug 05 '15 at 04:16
  • Btw, as to your original point, you are completely correct that RDBMSs are not good at saving and retrieving complex object graphs. I have always found OODBMSs much better at that. I believe most NOSQL data stores too are much better at such complex graphs, as long as the graphs are hierarchical. The hierarchical NOSQL's (key-value, document, column family) are good because you can usually store and load the whole graph as one complex object. Graph DBMSs would be faster and easier than an RDBMS because of their ability to navigate relationships. Still I'd go OODBMS for that part of your app.Tx – TomFH Aug 05 '15 at 15:09
  • RavenDB supports complex transactions up to and including support for distributed transactions using the Windows DTC. Not that i would ever recommend a distributed transaction for any reason. – Chris Marisic Aug 05 '15 at 16:21
  • Thanks. I'll look them up. – TomFH Aug 05 '15 at 18:51
-1

When I've investigated noSQL-style databases, I found that they did not provide ACID, nor did they provide relational features(not being relational databases). Since I like data consistency, and I have usually wanted some sort of relational feature, I've not selected noSQL databases.

However, I don't use the ORM tools out there, I tend to write SQL itself.

Paul Nathan
  • 39,638
  • 28
  • 112
  • 212
  • What sort of "relational feature"? That you can do joins on the data? – Chris Marisic Jul 20 '10 at 22:37
  • 1
    ACID and relational are orthogonal. Both [non-relational ACID](http://en.wikipedia.org/wiki/Berkeley_DB) and [non-ACID SQL](http://en.wikipedia.org/wiki/MyISAM) are in widespread use. – taw Aug 05 '10 at 05:17
-2

My way of looking at the question is the opposite: Why would I ever need noSQL at all ?

SQL provides me with relational modelling, transactions, triggers, keys, constraints, dynamic schemas that can be modified in the blink of an eye YET guarantee data integrity, blazing fast complex queries on data that is represented in its purest and cleanest form.

Your problem is that you're trying to fit square pegs in round holes: objects and rdbms's don't go well together, because the RDBMS is designed to handle many of your more complex get/set logic, and enforce consistency, which is exactly what you expect from your object layer.

Protip: drop the objects, they're not the right tool for the job.

Morg.
  • 697
  • 5
  • 7
-2

My key question was where would a SQL database really outshine a document database and from all the responses there really doesn't seem to be much.

Given that NoSQL databases come in just as many variations of types of databases as relational that both match all or some parts of ACID depending on which database you use that at this point they are basically the equitable for solving problems.

After this the key differences would be tooling and maturity which SQL databases have a much larger grasp in for being the established player but this is how it is for all new technology.

Chris Marisic
  • 32,487
  • 24
  • 164
  • 258
  • Don't forget you took away two important arguments: Query flexibility/Reporting and the lack of power of key/value store. Basically, you can boil your question down to, is Sql without all that's good about it better than NoSql without all of its problems. Also from a business perspective (leaving aside technology), an important factor is community adoption, and any SQL implementation is still light years ahead of NoSql in that regard. – Shlomo Aug 10 '10 at 14:42
  • A key-value store is a very special tool used for specific purposes and has never been meant to replace a SQL database. I did not state any where about query flexibility, I stated about reporting which should in theory be done in an OLAP database, not a relational database. – Chris Marisic Aug 10 '10 at 17:01
  • @coolgeek that article ever further resounds my point of not using a normal database for reporting, FTA "While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database." So while it might leverage a direct relational database for the storage it still requires you to maintain an OLAP database. – Chris Marisic Aug 13 '10 at 15:06