I have an article table, it can be associated with many categories.
Here is solution 1 with FIND_IN_SET:
CREATE TABLE `article` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`cat` varchar(512) DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5090104 DEFAULT CHARSET=utf8mb4
And here is solution 2 with table join:
CREATE TABLE `article_2` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5511464 DEFAULT CHARSET=utf8mb4
CREATE TABLE `article_cat` (
`cat` int(10) unsigned NOT NULL,
`article_id` int(10) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`cat`,`article_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
Then, here is some query comparison:
mysql> select count(1) from article;
+----------+
| count(1) |
+----------+
| 5207355 |
+----------+
1 row in set (1.12 sec)
mysql> select count(1) from article_2;
+----------+
| count(1) |
+----------+
| 5207313 |
+----------+
1 row in set (0.74 sec)
mysql> select count(1) from article_cat;
+----------+
| count(1) |
+----------+
| 20589375 |
+----------+
1 row in set (2.85 sec)
mysql> select count(1) from `article` where FIND_IN_SET('2',`article`.`cat`);
+----------+
| count(1) |
+----------+
| 2157053 |
+----------+
1 row in set (1.22 sec)
mysql>
mysql> select count(1) from `article_2` u INNER JOIN article_cat c ON c.article_id = u.id WHERE c.cat =2;
+----------+
| count(1) |
+----------+
| 2156622 |
+----------+
1 row in set (13.72 sec)
mysql> select * from `article` where FIND_IN_SET('2',`article`.`cat`) limit 2000000,10;
+---------+-----------------+---------------------+---------------------+
| id | cat | created_at | updated_at |
+---------+-----------------+---------------------+---------------------+
| 5132154 | 2,5,7 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132155 | 1,2,3,5,6,8,9 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132156 | 1,2,6,7 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132157 | 1,2,3,5,9 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132158 | 1,2,3,4,6,7,9 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132159 | 1,2,5,6,7 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132160 | 2,3,4,5,7,9 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132161 | 1,2,5,9 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132164 | 1,2,3,4,6,7,8,9 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
| 5132166 | 1,2,3,4,5,8 | 2019-05-13 15:24:58 | 2019-05-13 15:24:58 |
+---------+-----------------+---------------------+---------------------+
10 rows in set (1.28 sec)
mysql> select * from `article_2` u INNER JOIN article_cat c ON c.article_id = u.id WHERE c.cat =2 limit 2000000,10 ;
+---------+---------------------+---------------------+-----+------------+---------------------+---------------------+
| id | created_at | updated_at | cat | article_id | created_at | updated_at |
+---------+---------------------+---------------------+-----+------------+---------------------+---------------------+
| 5133109 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133109 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133110 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133110 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133113 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133113 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133116 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133116 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133117 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133117 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133120 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133120 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133124 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133124 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133133 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133133 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133137 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133137 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
| 5133138 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 | 2 | 5133138 | 2019-05-13 15:25:01 | 2019-05-13 15:25:01 |
+---------+---------------------+---------------------+-----+------------+---------------------+---------------------+
10 rows in set (14.01 sec)
From what I know, FIND_IN_SET does a full scan on columns, it doesn't benefit from an index, but here I see it has a better performance than join table solution, is this normal?
would it be a good idea to use FIND_IN_SET where the table article size increased to 10000000+? if not, then when it would be better to use FIND_IN_SET?
UPDATE:
FIND_IN_SET does perform better in some cases, the only problem is that is not relational database design style.