Imagine that we have a website where users can read articles, view photos, watch videos, and many more. Every "item" may be commented, so that we need space to save that comments somewhere. Let's discuss storage possibilities for this case.
Distributed solution
We can obviously create separate tables for each "item", so that we have tables like:
CREATE TABLE IF NOT EXISTS `article_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`createdBy` int(11) DEFAULT NULL,
`createdAt` int(11) DEFAULT NULL,
`article` int(11) DEFAULT NULL,
`content` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
and then obviously photo_comments
, video_comments
, and so on. The advantages of this way are as follows:
- we can specify Foreign Key to every "item" table,
- database is divided into logical parts.
- there is no problem with export of such data.
Disadvantages:
- many tables
- probably hard to maintain (adding fields, etc.)
Centralized solution
On the other hand we can merge all those tables into two:
CREATE TABLE IF NOT EXISTS `comment_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
and
CREATE TABLE IF NOT EXISTS `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`createdBy` int(11) DEFAULT NULL,
`createdAt` int(11) DEFAULT NULL,
`type` int(11) DEFAULT NULL,
`content` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Table comment_types
is a dictionary, it contains key-value pairs of commented item "type" and its name, for example :
1:Articles
2:Photos
3:Videos
Table comments
stores usual data with additional type
field.
Advantages:
- Maintenance (adding / removing fields),
- Adding new comment types "on the fly".
Disadvantages:
- Harder to migrate / export,
- Possible performance drop when querying large dataset.
Discussion:
- Which storage option will be better in terms of query performance (assume that dataset IS big enough for that to be the case),
- Again performance - will adding INDEX on
type
remove or drastically reduce that percormance drop? - Which storage option will be better in terms of management and possible migration in the future (distributed will be better, of course, but let's see if centralized one isn't the one far away)