We've been running into a very odd problem on our MySQL RDS instances with SELECT COUNT(*)
commands and after three times of it happening and no luck finding a solution, I figure I'd ask here too.
The issue happens when (but not always) we do a SELECT COUNT(*) FROM
on relatively large (1M+ rows) tables. The query, for some reason, spikes the server's CPU to 100% usage and takes way longer than it should.
We try to kill it via the KILL {threadId} command and the query will just switch on the process list to status "killed" but nothing will happen.
I understand that running KILL on the thread only sets a flag that the query should be terminated - there's no immediate way to terminate it - but this is literally a SELECT COUNT(*)
operation on a table to see how many rows are there. No WHERE
clause, no JOIN
s, no need for the optimizer to create a virtual table and nothing to undo.
This has happened in production but also in our read-only replica mirror RDS, so it's not even a deadlock with other transactions [save for the replica process itself perhaps?] -- we couldn't find any notable deadlocks in the INNODB STATUS log either.
We can't figure out why
- such a simple command just up and murders the CPU
- why the thread just won't die.
We took the liberty last time to let the command keep running overnight to see if it was "just a matter of time and it will get killed when it's done processing whatever it is doing" but after 9 hours we had to reboot the server.
All for times this happened - thrice in Replica and unfortunately once in production - we had no choice but to reboot.
I can't tell if this is a MySQL bug [we're running 8.0.19] or what could be happening here... Any help on where to look or if this is a known issue would be greatly appreciated.
[[EDIT]] Ran the same command today to test for deadlocks. No deadlocks show up in the SHOW ENGINE INNODB STATUS results, but we noticed that the CPU only spikes once the KILL command is issued. The query itself only showed a moderate climb in Read IOPS but CPU usage was minimal - in fact it was actually falling when the SELECT was run, and only climbed once the KILL was issued.
[UPDATE]
@zedfoxus the table does have a primary key autoincrement column. Here's the EXPLAIN of the original query:
And here's the explain of a select count on the id field:
[UPDATE 2] as recommended in comments by @zedfoxus, I tried running
select count(*)
from push_notification_async_queue
use index (primary)
The query ran successfully in 6 minutes. As a test I decided to run the query and kill it preemptively, which caused the bad result to repeat.
Interesting note: the select with use index(primary) ran in 6 minutes but the CPU usage only peaked at 70% and never really settled too high.
The killed query, however, immediately climbed to 100% and stayed there for 12+ minutes, so it's not that it was just ignoring the kill flag and running normally....
[UPDATE 3] As requested by @Wilsonhauck, here's the SHOW CREATE TABLE:
CREATE TABLE `push_notification_async_queue` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`customer_id` varchar(50) DEFAULT NULL,
`store_id` smallint DEFAULT NULL,
`token` varchar(256) NOT NULL,
`title` varchar(47) DEFAULT NULL,
`body` longtext NOT NULL,
`image` text,
`url` text,
`category` varchar(256) DEFAULT NULL,
`provider` varchar(50) DEFAULT NULL,
`sent_on` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IX_push_notification_async_queue-sent_on` (`sent_on`)
) ENGINE=InnoDB AUTO_INCREMENT=19082441 DEFAULT CHARSET=utf8
And here's the SHOW TABLE STATUS for it [apologies, I don't know how to copy a table properly here]:
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
push_notification_async_queue InnoDB 10 Dynamic 14616288 2894 42310041600 0 302432256 8388608 19082441 2021-09-24 23:38:49 2021-12-10 23:44:42 utf8_general_ci "" ""