2

TARGET_RDBMS: MySQL-5.X-InnoDB ("X" equals current stable release)

BACKGROUND: Building my first database with true referential integrity constraints, in an effort to get feedback, after creating the "real" DDL, I've made an abstraction that I believe covers the "feel" of the database; this is only 3 tables of about 20, all with referential integrity constraints; only pattern I see that is missing is a composite key table, which does not have data to be dumped in right now anyway, so I'm just focus on the first iteration.

Sample Data / Unit Test: One thing I do not know is how to build out a sample data set that will offer 100% coverage of the referential integrity modeled -- AND build "Unit Test" around that sample data and this DDL:

Sample DLL:

(Note: Just to be clear, the LEGEND and naming standards are JUST for this example, which I've abstracted from the "real" database. The column names are robotic in nature, and meant to make the meaning and relationship of a given instance as clear as possible. If you have suggestions on the notation system used, please feel free to comment. I'm open to any suggestions. Thanks!)

CREATE DATABASE sampleDB;

use sampleDB;

# ###############
# LEGEND
# - sID = surrogate key
# - nID = natural key
# - cID = common/shared across tables, but NOT unique/natural-key
# - PK = Primary Key
# - FK = Foreign Key
# - data01 = Sample data (non-key,not-shared-across-tables)
# - data02 = Sample data NOT NULL (non-key,not-shared-across-tables)
#
# - uID = user defined unique/natural key (NOTE: not used)

# ###############
# Behavior
# - create_timestamp (NOT NULL, updated on record creation, NOT update)
# - update_timestamp (NOT NULL, updated on record creation AND updates)

CREATE TABLE `TABLE_01` (
  `TABLE_01_sID_PK` MEDIUMINT NOT NULL AUTO_INCREMENT,
  `TABLE_01_cID` int(8) NOT NULL,
  `TABLE_01_data01` varchar(128) default NULL,
  `TABLE_01_data02`  varchar(128) default NULL,
  `create_timestamp` DATETIME DEFAULT NULL,
  `update_timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (`TABLE_01_sID_PK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `TABLE_02` (
  `TABLE_02_sID_PK` MEDIUMINT NOT NULL AUTO_INCREMENT,
  `TABLE_02_nID_FK__TABLE_01_sID_PK` int(8) NOT NULL,
  `TABLE_02_cID` int(8) NOT NULL,
  `TABLE_02_data01` varchar(128) default NULL,
  `TABLE_02_data02` varchar(128) NOT NULL,
  `create_timestamp` DATETIME DEFAULT NULL,
  `update_timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`TABLE_02_sID_PK`),
  FOREIGN KEY (TABLE_02_nID_FK__TABLE_01_sID_PK) REFERENCES TABLE_01(TABLE_01_sID_PK),
  INDEX `TABLE_02_nID_FK__TABLE_01_sID_PK` (`TABLE_02_nID_FK__TABLE_01_sID_PK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `TABLE_03` (
  `TABLE_03_sID_PK` MEDIUMINT NOT NULL AUTO_INCREMENT,
  `TABLE_03_nID_FK__TABLE_01_sID_PK` int(8) NOT NULL,
  `TABLE_03_nID_FK__TABLE_02_sID_PK` int(8) NOT NULL,
  `TABLE_03_cID` int(8) NOT NULL,
  `TABLE_03_data01` varchar(128) default NULL,
  `TABLE_03_data02` varchar(128) NOT NULL,
  `create_timestamp` DATETIME DEFAULT NULL,
  `update_timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`TABLE_03_sID_PK`),
  FOREIGN KEY (TABLE_03_nID_FK__TABLE_01_sID_PK) REFERENCES TABLE_01(TABLE_01_sID_PK),
  FOREIGN KEY (TABLE_03_nID_FK__TABLE_02_sID_PK) REFERENCES TABLE_02(TABLE_02_sID_PK),
  INDEX `TABLE_03_nID_FK__TABLE_01_sID_PK` (`TABLE_03_nID_FK__TABLE_01_sID_PK`),
  INDEX `TABLE_03_nID_FK__TABLE_02_sID_PK` (`TABLE_03_nID_FK__TABLE_02_sID_PK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

SHOW TABLES;

# DROP DATABASE `sampleDB`;

# #######################
# View table definition
# DESC inserttablename;

# #######################
# View table create statement
# SHOW CREATE TABLE example;

Questions:

Any and all feedback on missing, wrong, or "better" ways to do this database build are welcome. If you have questions, just comment -- and I'll respond ASAP. Again, thanks~!

UPDATE (1):

Just added "MEDIUMINT NOT NULL AUTO_INCREMENT" to the PKs -- not sure how I left that off.

blunders
  • 3,619
  • 10
  • 43
  • 65

1 Answers1

3

First of all, I want to applaud you for defining a standard. There is no end to how much it will come to help you in the future.

Having said that, a couple of very subjective opinions from my part:

I don't like to embed type information in names, such as "TABLE_PERSON" or "PERSON_T" because it becomes confusing the second you replace a table with a view instead. At this point you could of course search and replace "PERSON_T" with "PERSON_VW" instead, but it kind of misses the point :) The same goes for columns (although i can't see this in your example). Think of the "n_is_dead" column that gets changed from numeric to varchar.

Can a row exist in a table without being created (create_timestamp)? Declare columns as NOT NULL if they really can't be null. In fact, I start of having NOT NULL on most of my columns because it makes me think harder about the nature of the data.

I'm a fan of naming the primary key column something other than ID. For example

company(company_id, etc)
person(person_id, company_id, firstname etc)

I've heard some people have problems with O/R mappers that want you to have the primary key named "ID" at all times, but I don't know if this is still true of if this has changed recently.

It's not clear to me if you intented to embed (s,n,c) in the column names to indicate whether they are surrogate, natural or common key. But I also don't think this is a good idea. I feel that would "reveal" some implementation detail that doesn't fit naturally in the logical model.

It looks like you are exposing/embedding the foreign key relationship in the column names. I have never thought of this, but I think you will deeply regret this one. If not only because it makes the column names unbearably uggly :)

When choosing a name for an index. The only time I regret naming an index something is when I look at an execution plan and see "index_01" being used. I always wish I had put the column name in the index to make it visible in the xplan. I don't know the limit for an index name, but I always run into the limit on Oracle. So, try to come up with some rule for how to abbreviate the table name. The column name is the important thing here.

Regarding mixed case. I always (no exceptions) go with either ALL_UPPER_CASE or all_lower_case. The reason is that in the past I've been burned when migrating queries between databases when they treat case differently. Lately, I use all_lower_case because the typical font of our editors makes it easier to spot spelling errors in lower case than in upper case. And when I fail at things, it doesn't seem like the editor is SHOUTING AT ME ;)

Ronnis
  • 12,593
  • 2
  • 32
  • 52
  • @Ronnis: Thanks, the "embed type information in names" are just for the sample abstraction; meaning I wanted it to be as abstract as possible without losing meaning AND make it as clear as possible what my understanding of that meaning was in the code itself. Clearly something like TABLE_03_nID_FK__TABLE_02_sID_PK is crazy hard to read, BUT I can't count the number of times I've seen a DDL abstract or real that I had no idea what the meaning or relation of a given element was. – blunders Dec 03 '10 at 15:59
  • @Ronnis: Also, still looking over your feedback, but just want to say thanks, means a lot to me to get feedback. Meaning even the fact someone looked at the code makes a HUGE difference to me, since I'm a "cowboy coder" and have no internal means of getting feedback. – blunders Dec 03 '10 at 16:00
  • 1
    @Ronnis: "In fact, I start of having NOT NULL on most of my columns because it makes me think harder about the nature of the data." First I've ever read it anywhere, and really like the concept in part because clearly you can't have all fields default to NULL. Thanks for that pointer. – blunders Dec 03 '10 at 16:02
  • @Ronnis: "I've heard some people have problems with O/R mappers that want you to have the primary key named "ID" at all times, but I don't know if this is still true of if this has changed recently." Interesting to know, since my likely next step is to pick an app framework with an ORM. Completely agree about not using just ID for the PK if possible, but also have a very limit level of experience too. – blunders Dec 03 '10 at 16:05
  • @Ronnis: "It's not clear to me if you intented to embed (s,n,c) in the column names to indicate whether they are surrogate, natural or common key." First, after posting my code it appears there's an issue with some of my reference to (s,n) though have to think about it. Again, all the column names are abstraction meant to cover meaning and relations, intent for the (s,n,c) was more for my goal of being able to deploy unit test and sample data. Plus, guess it also says something about my subjective choice to use surrogate keys over natural keys. – blunders Dec 03 '10 at 16:10
  • @Ronnis: Agree about NOT using mixed case, and using either ALLCAPS or alllowercase -- just used mixed casing in the example for my own clearly biased idea of what is readable for this one off example. – blunders Dec 03 '10 at 16:14
  • @Ronnis: "When choosing a name for an index." Agree with your points, though know nothing about "execution plans" or execution plan in MySQL. Have any suggested resources for learning more? Again, deeply thank you for all your input. Cheers! – blunders Dec 03 '10 at 16:16
  • I'm just happy to help :) I don't know how the execution plans work in detail in MySQL (I have an Oracle background). You might have some look posting another question about it here or you can go check out http://forums.mysql.com – Ronnis Dec 03 '10 at 18:42
  • About the actual advice I've given. I don't think you should see them as some universal truth, more as "food for thought". Many people will disagree with my conclusions. They are just opinions I've formed by doing just the kind of work I've been doing (Large scale DW with handcrafted ETL). In another context (using frameworks and O/R-mappers) they might not apply. – Ronnis Dec 03 '10 at 18:46
  • 3
    Using ID is a poor choice for when you get to complex queries as it can lead to incorrect results when somone copies a join and changes the tablename but misses changing the alias in the join condition. It also is a pain for reporting when you may need to return 6 columns with the same name. – HLGEM Dec 03 '10 at 19:32
  • @Ronnis: Yes, I know your input is based on your take and experiences, but so is everyone's, regardless if they disclaim it or not. There's no right answer in the world of data, only the best answer at the time. – blunders Dec 04 '10 at 02:32