0

I've created many databases before, but I have never linked two tables together. I've tried looking around, but cannot find WHY one would need to link two or more tables together.

There is a good tutorial here that goes over database relationships, but does not explain why they would be needed. He just simply says that they are.

Are they truly necessary? I understand that (in his example) all orders have a customer, and so one would link the orders table to the customers table, but I just don't see why this would be absolutely necessary. I can (and have) created shopping carts and other complex databases that work just fine without creating any table relationships.

I've just started playing around with MySQL Workbench v6.0 for a new project that has a fairly large and complex database, and so I'm wondering if I am losing anything by creating the entire project without relationships?

NOTE: Please let me know if this question is too general or off topic, and I will change it. I understand that a lot can be said about this topic, and so I'm really just looking to know if I am opening myself up to any security issues or significant performance issues by not using relationships. Please be specific in your response; "Yes you are opening yourself up to performance issues" is useless and not helpful for myself, nor for anyone else looking at this thread at a later date. Please include details and specifics in your response.

Thank you in advance!

Birrel
  • 4,754
  • 6
  • 38
  • 74
  • 1
    are you talking about defining foreign keys in mysql or using joins in general? – Sam Feb 18 '14 at 21:05
  • 1
    Using joins in general. There are plenty of tutorials available that explain *how* to do all these things, but none of them explain *why* you would do them in the first place. – Birrel Feb 18 '14 at 21:08
  • 1
    The answer to this question is too much to write in one post. Read on Normalization and Normal forms – Sam Feb 18 '14 at 21:11
  • 1
    "The goal of database normalization is to ensure that every non-key column in every table is directly dependent on the key, the whole key and nothing but the key and with this goal come benefits in the form of **reduced redundancies, fewer anomalies, and improved efficiencies**." http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/intro-to-normalization.html – showdev Feb 18 '14 at 21:15
  • @SamD I expected that it would be a hefty question to answer on here, so I appreciate the suggestion of where to go from this point. – Birrel Feb 18 '14 at 21:29
  • There are some resources online that explain normal forms and the reasoning behind them for example http://databases.about.com/od/specificproducts/a/normalization.htm For more concrete understanding get a book on database design. http://www.amazon.com/Beginning-Database-Design-Novice-Professional/dp/1430242094 – Sam Feb 18 '14 at 21:33

3 Answers3

2

As Sam D points out in the comments, entire books can be written about database design and why having tables with relationships can make a lot of sense.

That said, theoretically, you lose absolutely no expressive/computational power by just putting everything in the same table. The primary arguments against doing so likely deal with performance and maintenance issues that might arise.

Patrick87
  • 27,682
  • 3
  • 38
  • 73
  • I still use multiple tables, opposed to jamming everything into one. I just don't use relationships. I would still have a 'customers' table and an 'orders' table, and the latter of the two would have a column stating which customer made the order, but I just would not include any sort of link between the two. – Birrel Feb 18 '14 at 21:18
  • So are all your JOINs made on the application side? – planestepper Feb 18 '14 at 21:19
  • What do you mean? If I have two separate tables that work in concert with one another, I will create a column in one of them that is used as a reference to point to the other. When I execute the PHP script upon page load, the reference is read and used to access the other table. If I were to use Workbench to create the database, I would not include relationships in the layout and simply use the reference columns that I have been using in the past - which seem to be essentially the same thing as defining relationships. – Birrel Feb 18 '14 at 21:27
  • 1
    @Birrel If I understand you correctly, a better way to think about how you use relational databases might be as follows: you do make use of relationships, although you don't codify these relationships into the database schema. Of course, you're free to do this; the disadvantage with this is that the overall state of all tables might become inconsistent (i.e., maybe your Orders table references a Customer that has been deleted from the Customers table, etc.) Defining foreign keys helps prevent these and similar maintenance issues. – Patrick87 Feb 18 '14 at 21:28
  • Ok. so you do use joins but don't define foreign key relationships in mysql am I right? – Sam Feb 18 '14 at 21:28
  • 1
    To say a bit more... establishing foreign keys also helps define the semantics of your tables, by codifying into the database that Column A of Table B contains the same kind of data as Column X of Table Y and, indeed, the values in Column A will form a subset of the values in Column X. Otherwise, you're always free to compare any similarly-typed values you want; you can join two tables based on Age = Zipcode if you want, although these represent very different kinds of information. – Patrick87 Feb 18 '14 at 21:33
  • 1
    @Patrick87 thank you for your input. I never considered that creating relationships would help mitigate issues with inconsistencies, but I take great care to ensure that all referenced data remains intact, and to automatically remove the reference if said data is terminated or compromised. – Birrel Feb 18 '14 at 21:33
  • @SamD that sounds about right. I do not define any foreign key relationships, but I do utilize joins (via PHP script used to access the database). – Birrel Feb 18 '14 at 21:35
  • 2
    then it is not necessary to do so as mentioned by Patrick but considered a best practice because foreign key in schema helps maintain referential integrity of the data and define what to happen in case a parent table gets deleted. for example on cascade all the child rows that are connected to a parent entity will get deleted. Read http://en.wikipedia.org/wiki/Foreign_key – Sam Feb 18 '14 at 21:37
1

The answer revolves around granularity, space consumption, speed, and detail.

Inherently different types of data will be more granular than others, as items can always be rolled up to a larger umbrella. For a chain of stores, items sold can be rolled up into transactions, transactions can be rolled up into register batches, register batches can be rolled up to store sales, store sales can be rolled up to company sales. The two options then are:

  1. Store the data at the lowest grain in a single table
  2. Store the data in separate tables that are dedicated to purpose

In the first case, there would be a lot of redundant data, as each item sold at location 3 of 430 would have store, date, batch, transaction, and item information. That redundant data takes up a large volume of space, when you could very easily create separated tables for their unique purpose.

In this example, lets say there were a thousand transactions a day totaling a million items sold from that one store. By creating separate tables you would have:

  • Stores = 430 records
  • Registers = 10 records
  • Transactions = 1000 records
  • Items sold = 1000000 records

I'm sure your asking where the space savings comes in ... it is in the detail for each record. The store table has names, address, phone, etc. The register has number, purchase date, manager who reconciles, etc. Transactions have customer, date, time, amount, tax, etc. If these values were duplicated for every record over a single table it would be a massive redundancy of data adding up to far more space consumption than would occur just by linking a field in one table (transaction id) to a field in another table (item id) to show that relationship.

Additionally, the amount of space consumed, as well as the size of the overall table, inversely impacts the speed of you querying that data. By keeping tables small and capitalizing on the relationship identifiers to link between them, you can greatly increase the response time. Every time the query engine needs to find a value, it traverses the table until it finds it (that is a grave oversimplification, but not untrue), so the larger and broader the table the longer the seek time. These problems do not exist with insignificant volumes of data, but for organizations that deal with millions, billions, trillions of records (I work for one of them) storing everything in a single table would make the application unusable.

There is so very, very much more on this topic, but hopefully this gives a bit more insight.

PlantTheIdea
  • 16,061
  • 5
  • 35
  • 40
  • Thank you for your detailed answer, I greatly appreciate it. I separate everything into smaller tables (opposed to one giant table), as you have suggested, I just do not have any relationships defined between them. I don't anticipate that this will ever get into the trillions, billions or even millions, but it is good to know! Thanks again! – Birrel Feb 18 '14 at 21:21
  • 1
    Inherently by having items like customer ID stored in your order table to identify what customer bought the order provides a link between the two pieces of information. You may not have set up a hard RI (we never do ourselves), but the fact you have data that can interconnect with another means you do in fact have a relationship borne from it. But what matters more than anything else is how you use your data, and especially if you split (normalize) your table structure, proper relational setups will make your applications a lot faster, regardless of table size. – PlantTheIdea Feb 18 '14 at 21:27
  • Perfect! Hearing that the pros go the same route as I have previously done makes me feel a lot better about it! I should rephrase... "hearing that *I* go the same route as the pros..." It makes sense to hear you say that the relationship exists because of the references I include. I just don't set up the relationships within the database via SQL script or anything like that. – Birrel Feb 18 '14 at 21:39
  • 1
    @Birrel Just because we do it too, doesn't make it good practice :) – Patrick87 Feb 18 '14 at 21:40
1

Short answer: In a relational database like MySQL Yes. Check this out about referential integrity http://databases.about.com/cs/administration/g/refintegrity.htm

That does not mean that you have to use relational databases for your project. In fact the trend is to use Non-Relational databases (NoSQL), like MongoDB to achieve same results with better performance. More about RDBMS vs NoSQL http://www.zdnet.com/rdbms-vs-nosql-how-do-you-pick-7000020803/

I think that with this example you will understand better:

Let's we want to create on-line store. We have at minimum Users, Payments and Events (events about the pages where the user navigates or other actions). In this scenario we want to link in a secure and relational way the Users with the Payments. We do not want a Payment to be lost or assigned to another User. So we can use a RDBMS like MySQL to create the tables Users and Payments and linked the with proper Foreign Keys. However for the events, we are going to be a lot of them per users (maybe millions) and we need to track them in a fast way without killing the relation database. In that case a No-SQL database like MongoDB makes totally sense.

To sum up to can use an hybrid of SQL and NO-SQL, but either if you use one, the other or both kind of solutions, do it properly.

Rafa Paez
  • 4,820
  • 18
  • 35