I have a big table with over 3B rows and my query now takes 3 days to execute.
This is my query:
insert into frequencywords (word, frequency, filename)
select word, count(*) as frequency, max(filename)
from allwordstemp
group by word
Basically, my query group by word from allwordstemp table, and I want to know the filename when frequency = 1 that's why I added max(filename) because filename needs to be contained in an aggregate function such as max. I also don't need the value of filename if frequency > 1. Both tables have 2 indexes on word and filename.
allwordstemp table (filename is an id file):
CREATE TABLE `allwordstemp` (
`word` varchar(45) DEFAULT NULL,
`filename` int(11) DEFAULT NULL,
KEY `idx_allwordstemp_word` (`word`),
KEY `idx_allwordstemp_filename` (`filename`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
frequencywords table:
CREATE TABLE `frequencywords` (
`word` varchar(45) DEFAULT NULL,
`frequency` int(11) DEFAULT NULL,
`filename` int(11) DEFAULT NULL,
KEY `idx_frequencywords_word` (`word`),
KEY `idx_frequencywords_frequency` (`frequency`),
KEY `idx_frequencywords_filename` (`filename`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
EXPLAIN SELECT:
+----+-------------+--------------+------------+-------+---------------+-----------------------+---------+------+------------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+-----------------------+---------+------+------------+----------+-------------+
| 1 | SIMPLE | allwordstemp | NULL | index | NULL | idx_allwordstemp_word | 138 | NULL | 3487864881 | 100.00 | Using index |
+----+-------------+--------------+------------+-------+---------------+-----------------------+---------+------+------------+----------+-------------+
How can I make my query faster?