1

Imagine that we have a website where users can read articles, view photos, watch videos, and many more. Every "item" may be commented, so that we need space to save that comments somewhere. Let's discuss storage possibilities for this case.


Distributed solution

We can obviously create separate tables for each "item", so that we have tables like:

CREATE TABLE IF NOT EXISTS `article_comments` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `createdBy` int(11) DEFAULT NULL,
  `createdAt` int(11) DEFAULT NULL,
  `article` int(11) DEFAULT NULL,
  `content` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

and then obviously photo_comments, video_comments, and so on. The advantages of this way are as follows:

  • we can specify Foreign Key to every "item" table,
  • database is divided into logical parts.
  • there is no problem with export of such data.

Disadvantages:

  • many tables
  • probably hard to maintain (adding fields, etc.)

Centralized solution

On the other hand we can merge all those tables into two:

CREATE TABLE IF NOT EXISTS `comment_types` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

and

CREATE TABLE IF NOT EXISTS `comments` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `createdBy` int(11) DEFAULT NULL,
  `createdAt` int(11) DEFAULT NULL,
  `type` int(11) DEFAULT NULL,
  `content` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

Table comment_types is a dictionary, it contains key-value pairs of commented item "type" and its name, for example :

1:Articles
2:Photos
3:Videos

Table comments stores usual data with additional type field.

Advantages:

  • Maintenance (adding / removing fields),
  • Adding new comment types "on the fly".

Disadvantages:

  • Harder to migrate / export,
  • Possible performance drop when querying large dataset.

Discussion:

  • Which storage option will be better in terms of query performance (assume that dataset IS big enough for that to be the case),
  • Again performance - will adding INDEX on type remove or drastically reduce that percormance drop?
  • Which storage option will be better in terms of management and possible migration in the future (distributed will be better, of course, but let's see if centralized one isn't the one far away)
Tomasz Kowalczyk
  • 10,472
  • 6
  • 52
  • 68
  • Well, what you propose is a simplified form of the Smith&Smith model. Nothing wrong with it, excellent performance on modern SQL implementations. – wildplasser Oct 26 '11 at 17:15
  • @wildplasser - What is the Smith&Smith model? First time I have heard of it. Can you link to it? – Oded Oct 26 '11 at 17:16
  • Never heard about `Smith&Smith`, can you share some links / elaborate a little bit more on the topic? – Tomasz Kowalczyk Oct 26 '11 at 17:16
  • not sure if there is an 'answer' but my opinion is go with the second option. - yes indexes will help. – Randy Oct 26 '11 at 17:18
  • http://www.google.nl/url?sa=t&rct=j&q=smith%20%26%20smith%20data%20model&source=web&cd=4&sqi=2&ved=0CDsQFjAD&url=http%3A%2F%2Fwww.igi-global.com%2Fviewtitlesample.aspx%3Fid%3D51125&ei=wkCoTtarGYag-wbL4PXMDw&usg=AFQjCNFpBeWL0VB7vo1h0R7gX_nBxJ6bgg&cad=rja (PDF alert) – wildplasser Oct 26 '11 at 17:22
  • Lazy Badger, can you explain what errors I made? – Tomasz Kowalczyk Oct 26 '11 at 17:39
  • http://svensky.free.fr/Cours/DDB/Smith%20-%20Data%20base%20Abstractions%20Aggregation%20and%20Generalization.pdf (PDF alert) this one is better. Hard to find, because publicated before the internet era. – wildplasser Oct 26 '11 at 17:40

1 Answers1

1

I'm not sure either of the disadvantages you list for option 2 are serious, data export is easily accomplished with a simple WHERE clause and I wouldn't worry about performance. Option 2 is properly normalised and in a modern relational database performance should be excellent (and can be tweaked further with appropriate indexes etc if necessary).

I would only consider the first option if I could prove that it was necessary for performance, scalability or other reasons - but it must be said that seems unlikely.

Simon
  • 6,062
  • 13
  • 60
  • 97
  • You've covered most points - thank you. For the disadvantages - I had to put something there so I thought of everything, even unlikely, to fill that part so that someone can prove me wrong. :) – Tomasz Kowalczyk Oct 26 '11 at 18:24