12

EDIT I've just started skimming Codd's famous 1970 paper that started it all, that Oracle was based on (A Relational Model of Data for Large Shared Data Banks [pdf]), and was amazed to find that it seems it will answer this SO question. It talks about databases in the market at that time ("hierarchical" and "network" - like NoSQL?), the need for independence from internal representation, and a clear explanation of how to apply mathematical "relations" to a database.


Historically, what feature of relational databases gave what benefit that caused businesses to adopt it, making it massively successful?

Today, there are many reasons to use a RDB: it's standard, products are mature, debugged, full-featured, there's a choice of vendors, there's support, there's a trained workforce and so on. But why did it become so popular?

I've heard "hierarchical databases" were popular before relational databases - they sound like a key-value store, where the value can be another set of key-values. If so, that is similar to the object oriented databases that were publicized a decade or two ago; and also to XML/document databases and NoSQL.

Maybe ACID transactions (atomicity etc)? But that doesn't seem specific to RDB.

Maybe because relational databases enabled you to define a data schema that was purely about the data - independent of a particular programming language, version of an application (evolution), or purpose of the application (this makes "impedance mismatch" an inevitable) But any database with a data schema has this feature.

Maybe because the relational model is mathematically sound? But this doesn't sound like it would convince managers to adopt it - and what would be the business benefit.

Maybe because the mathematical model gives you a way to rearrange the database into different normal forms to give different performance characteristics, which are mathematically guaranteed to not change the meaning of the data? This seems plausible, and my uni textbooks make a big deal of it, but it doesn't sound very compelling to me as a practical business benefit (maybe I'm missing something)?

To summarise: historically, what made the relational model win so decisively over the hierarchical model? I'm also interested in whether RDB still have some special quality that actively makes them a better practical choice for businesses (other than the benefits of being a standard mentioned above).

Many thanks if you can shed some light - I've long been curious about this.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
13ren
  • 11,887
  • 9
  • 47
  • 64

5 Answers5

6

For the same reason why the script languages are popular.

You can make a query with your favorite text editor and just issue it, without bothering about the actual physical schema.

It's not the fastest model, not the most reliable model — it's just the most productive model. You can write ten times as many queries in an hour.

You may want to read this article in my blog which compares the most popular database models:

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • now a community wiki - please feel free to modify/add! thanks for your article, it seems to be right on the money. I'm having trouble connecting your answer here and it, but I haven't read the article in full detail yet. – 13ren Mar 04 '10 at 08:02
4

The concept of making a logical representation of data abstracted from its physical representation was perhaps the most game-changing aspect of Codd's idea. He was apparently the first person who fully realised the benefits of separating logical and physical concerns and therefore the first to devise a data model worthy of the name. By describing a model based on relations, without navigational links or pointer structures he also created something uniquely powerful, flexible and of lasting relevance.

To be accurate it must be said that it was the SQL model rather than the relational one which eventually proved more successful commercially. SQL is a long way from a truly relational data model or language even though it would not have come into being without Codd's ideas to inspire it. The relational model's creator was naturally disappointed that SQL rather than relational became the database standard. Four decades later I think we have plenty of cause to regret that Codd's relational model isn't better supported by DBMS software.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • Thanks - "a model based on relations [rather than] pointers" - this seems to be the answer, though I need more context to understand it. I remember the logical/physical distinction being emphasized. "SQL rather than relational became the database standard" - but isn't SQL a (partial) implementation of the relational model? [BTW: I feel you've left a word out when you compare SQL and relational, since they seem to be different kinds of things - but probably I just don't understand the background well enough) – 13ren Mar 05 '10 at 15:15
2

From my knowledge, it is the normalization theory (the well known Codd's Third Normal Form) to define relational data model that is easy and efficient for storing and retrieving. This followed by the Standard Query Language (SQL) which allows it to be used across all the relational db system. Standardization was definitely lacking back then which also make this appealing to many.

Fadrian Sudaman
  • 6,405
  • 21
  • 29
  • Thanks - I was reading in "Crossing the Chasm" that Oracle spearheaded the standardization on SQL, by porting their database to everything in sight (even though SQL was developed by IBM). This standardization helped relational databases enormously, but I believe they were already popular (and had "won" against hierarchical databases) before then. What made the Third Normal Form special compared to the others? – 13ren Mar 05 '10 at 15:09
2

I believe that none of the answers really nailed it, as you are interested about historical aspect of it.

If I were to single out a reason I would say Quassnoi was close, but SQL was not just any language - it has one great feature that was not common in 70s:

  • it is declarative, it describes what you want to get and does not prescribe how to do it

I don't think it is possible to overstate this.

of course, the relational model is also big factor (and is related to the above)

  • also not all databases that have a schema are only about data; hierarchical and network databases are also about how to get data, their structure determines the most efficient method to access data and therefore structure influence how will you model the problem (in another words modelling process is influenced by the way application will use it; this has an effect that another application that might need to use IMS for example would be in a severely disadvantaged position, or that some changes in the application would not require changes in the database design to achieve good performance - for example a need to sort by some new column)

note: on some of the above intentions SQL and 'R'DBMSes did not fully deliver (and/or other models overcame their problems in some way), but these were initial intentions and considering how stable SQL was in past ~40 years (here is a link to IBM's paper from 1974) or so it did not do such a bad, bad job either.

There is also this quote from here

“Ted’s basic idea was that relationships between data items should be based on the item’s values, and not on separately specified linking or nesting. This notion greatly simplified the specification of queries and allowed unprecedented flexibility to exploit existing data sets in new ways,” said Don Chamberlin, co-inventor of SQL, the industry-standard language for querying relational databases, and a research staff member at Almaden. “He believed that computer users should be able to work at a more natural-language level and not be concerned about the details of where or how the data was stored.”

At a 1995 reunion of IBM’s early relational database scientists, Chamberlin recalled having an epiphany as he first heard Codd describe his relational model at an internal seminar.

“Codd had a bunch of fairly complicated queries,” Chamberlin said. “And since I’d been studying CODASYL (the language used to query navigational databases), I could imagine how those queries would have been represented in CODASYL by programs that were five pages long that would navigate through this labyrinth of pointers and stuff. Codd would sort of write them down as one-liners. ... (T)hey weren’t complicated at all. I said, ‘Wow.’ This was kind of a conversion experience for me. I understood what the relational thing was about after that.”

I seem to remember a transcript of an interesting interview about begging of SQL but can't track it down..

Unreason
  • 12,556
  • 2
  • 34
  • 50
1

One key was the self contained products - you no longer had to manually define and maintain your key files (indexes) and the ability to change the data model with less effort. Combine that with the SET based structures made it a compelling product set to work with. Combine the SQL language on top of that to return data and it was a win-win situation over traditional ISAM data structures primarily associated with COBOL languages.

Mark Schultheiss
  • 32,614
  • 12
  • 69
  • 100
  • Thanks - your 1st and 3rd points seem to be about usability (self-contained & SQL). Couldn't the same have been implemented for the hierarchical model - or was there something different about the relational model that enabled them? i.e. what's the intrinsic feature that led to the benefit? Your 2nd point (SET based) seems intrinsic, though SQL doesn't fully implement the relational model, so the thing that has become so popular (SQL) is not actually "relational", just some aspects of it. I wonder what they are? [or maybe it's just luck that vendors made the usability changes to this model] – 13ren Mar 05 '10 at 15:33
  • 1
    First & 3rd points: separation of the data and code was distanced - still a lot of data stuff in code with SQL but much easier to separate - the other was VERY intertwined and while possible, very little was done along those lines. The second point revolves around data-retrieve a set rather than having to do stuff like a join manually in code with mass of data, thus less efficient. – Mark Schultheiss Mar 08 '10 at 17:48