I know this question has been asked a few times but I couldn't make my query fast enough by reading those answers.
Basically I have a table here with 400k rows. It used to have more than 1.8m rows bu the query time is over 17 secs so I have a cron job to cut off the records older than 5 days in that table to keep the records around 400k rows so the query time is just over 5 secs and 5 secs is still slow. We have another a few tables that involve more than 2m records and use JOIN so I prefer to solve this trend table first to gain more exp then touch the others to increase the query performance in more complicated cases.
Data structure:
| _id | doctype | subtype | term | user_id | nug_id | source | timestamp | confidence |
|-----|---------|---------|------|---------|---------|--------|-----------|------------|
| 123 | post | keyword | games| 1000 | 200 | twitter| 143389203 | 0.0123 |
I have indexed term
, timestamp
, source
, confidence
.
Normally my query is:
SELECT term, SUM(confidence) AS relevance FROM trends
WHERE source IN ("twitter", "tumblr", "instagram", "post", "flickr")
GROUP BY term ORDER BY relevance DESC
And here is my result:
Showing rows 0 - 29 (165032 total, Query took 5.8050 sec)
So what should I do next to optimize the index or query to increase the performance. I can now foresee that how bad my query time is gonna be when I query with JOIN.
Add1: Sorry I forgot to attach the EXPLAIN output.
Add2: Table structure
CREATE TABLE `trends` (
`_id` bigint(20) NOT NULL AUTO_INCREMENT,
`doctype` varchar(10) DEFAULT NULL,
`subtype` varchar(20) DEFAULT NULL,
`term` varchar(200) DEFAULT NULL,
`user_id` varchar(100) DEFAULT NULL,
`nug_id` varchar(100) DEFAULT NULL,
`timestamp` bigint(20) DEFAULT NULL,
`source` varchar(100) DEFAULT NULL,
`confidence` float DEFAULT NULL,
PRIMARY KEY (`_id`),
KEY `confidence` (`confidence`),
KEY `give_me_trends` (`user_id`,`source`),
KEY `term` (`term`,`source`),
KEY `timestamp` (`timestamp`,`confidence`),
KEY `source` (`source`)
) ENGINE=InnoDB AUTO_INCREMENT=95350350 DEFAULT CHARSET=utf8
Add3:
After created a new table called test_trends
and copied the data from the trends
table, I tested with source
column as integer. Also I removed two columns doctype
and subtype
as they are not needed at all. Queried as below:
SELECT term, SUM(confidence) AS relevance FROM test_trends
WHERE source IN (1,2,3,4,5,6,7)
GROUP BY term ORDER BY relevance DESC
in 5.4802 sec.
EXPLAIN as below:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|-----|-------------|-------------|--------|-------------------|---------|-----------|--------|--------|----------------------------------------------|
| 1 | SIMPLE | test_trends | index | source,source_2 | term_2 | 603 | NULL | 354324 | Using where; Using temporary; Using filesort |
Add4:
My test table structure:
CREATE TABLE `test_trends` (
`_id` bigint(20) NOT NULL AUTO_INCREMENT,
`term` varchar(200) DEFAULT NULL,
`user_id` varchar(100) DEFAULT NULL,
`nug_id` varchar(100) DEFAULT NULL,
`timestamp` bigint(20) DEFAULT NULL,
`source` tinyint(1) DEFAULT NULL,
`confidence` float DEFAULT NULL,
PRIMARY KEY (`_id`),
KEY `confidence` (`confidence`),
KEY `give_me_trends` (`user_id`,`source`),
KEY `term` (`term`,`source`),
KEY `timestamp` (`timestamp`,`confidence`),
KEY `source` (`source`),
KEY `term_2` (`term`),
KEY `source_2` (`source`,`confidence`,`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=95354268 DEFAULT CHARSET=utf8
Also I indexed term
, source
, confidence
, timestamp
.
Add5:
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 0 | PRIMARY | 1 | _id | A | 379365 | NULL | NULL | | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | confidence | 1 | confidence | A | 18 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | give_me_trends | 1 | user_id | A | 149 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | give_me_trends | 2 | source | A | 556 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | term | 1 | term | A | 379365 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | term | 2 | source | A | 379365 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | timestamp | 1 | timestamp | A | 13548 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | timestamp | 2 | confidence | A | 189682 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | source | 1 | source | A | 107 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | term_2 | 1 | term | A | 379365 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | source_2 | 1 | source | A | 18 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | source_2 | 2 | confidence | A | 189 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_trends | 1 | source_2 | 3 | timestamp | A | 189682 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+