2

For a project I need to implement row versioning in a MySQL database. After reading about the possible solutions I've chosen to use one table and add a start_date and end_date column, that specify when that version/row was active.

The id column will be the same for multiple versions of a row. Therefore it's no longer unique in the table. Now I'm not sure how to set up the primary key (and other indexes), while keeping auto increment active for the id column.

I see two options. The first one is making id just an index, like this:

CREATE TABLE `thing` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  … other_columns …,
  `start_date` datetime NOT NULL,
  `end_date` datetime DEFAULT NULL,
  KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The other is making id and start_date the primary key, like this:

CREATE TABLE `thing` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  … other_columns …,
  `start_date` datetime NOT NULL,
  `end_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`,`start_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

What would be the best option, especially when taking performance into consideration?


For example, the data would look like this:

id | some_column | start_date | end_date
---|-------------|------------|-----------
1  | Jonathn     | 2013-01-01 | 2013-01-02
1  | Jonathan    | 2013-01-02 | NULL
2  | James       | 2013-02-01 | NULL

In this case I added "Jonathn" but changed it to "Jonathan" later (while keeping the same ID). Later another row ("James") is added.

Jonathan
  • 6,572
  • 1
  • 30
  • 46
  • Neither of the above. First identify what is the key which is invariant over time, i.e. the key that identifies the thing you are tracking different versions of (AKA "business key", "domain key", "natural key"). Make that attribute / attributes a composite key with start_date. – nvogel Sep 18 '13 at 18:08
  • @sqlvogel I think I wasn't clear enough in my question. The `id` would not change for new versions of a row, so that'd be the invariant key. – Jonathan Sep 18 '13 at 19:58
  • @Jomathan, then it should not be autoincrement because autoincrement would force a new value for every row. – nvogel Sep 18 '13 at 20:15
  • @sqlvogel I've quickly tried the options described in my question, and in both cases I could just do `INSERT INTO thing (id) VALUES (1);` to insert a new version for an existing row with ID 1. – Jonathan Sep 18 '13 at 20:22

1 Answers1

0

Your first example CREATE TABLE statement doesn't have a key*, your second one does. I think your second example is what you want:

CREATE TABLE `thing` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  … other_columns …,
  `start_date` datetime NOT NULL,
  `end_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`,`start_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

You may or may not want id to be an auto-incrementing column but that's primarily a question of functionality (do you want to be able to generate incrementing values or just insert them yourself) rather than logical design or performance.

*MySQL unfortunately uses the keyword KEY to designate an index. Keys have to be specified with the UNIQUE KEY or PRIMARY KEY syntax.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • Thanks for the explanation, I'll go with this set-up. Is this (performance-wise) an acceptable method, or would you personally recommend something else? – Jonathan Sep 19 '13 at 07:02
  • As I'm currently implementing this method for one table in the database, I was thinking whether it might be better to use end_date (instead of start_date) for the PK. The pro would be that the table won't accept two rows with the same id and NULL as end_date (so you can be sure there's always at most one current row for an id in the table). Since start_date and end_date both don't have much semantic meaning, I don't see a downside of this. What do you think? – Jonathan Sep 21 '13 at 15:41