I have a SQL query, which I know is inefficient, based on mysql.slow_query log. I am using MySQL8.
I have three different tables, see show create table at bottom. Basically, a catch belongs to exactly one beat, a beat can have many catches. A catch belongs to exactly one river, a river can have many catches. A beat belongs to exactly one river, a river can have many beats. I guess you can say that the link from catch to river is excess, but maybe that is done for easing other queries elsewhere which is not dependent on beats.
The query is as follows:
select date
from catch_logs
where catch_logs.removed = false
and exists (select * from rivers where catch_logs.river_id = rivers.id and active_on_laksebors = 1 and rivers.deleted_at is null)
and exists (select * from beats where catch_logs.beat_id = beats.id and active_on_laksebors = 1)
and no_catch = false
order by date asc
limit 1;
So the query does a join with two other tables. When running explain
on the query, I get (a bit abbreviated, included what I think is relevant):
table | type | possible_keys | key | rows | filtered | extra |
---|---|---|---|---|---|---|
catch_logs | ref | < several indexes > | < several indexes > | 1656 | 1 | Using where; Using index |
beats | eq_ref | PRIMARY | PRIMARY | 1 | 100 | Using where |
rivers | all | PRIMARY | null | 554 | 10 | Using where; Using temporary; Using filesort |
554 is the number of entries in rivers. Beats has ~1500 rows, but only checks 1. Basically, what I want to achieve, is that both beats and rivers only check 1 row, as I actually have this id from catch_logs.
I have seen in the MySQL documentation for type
, that all
, which is used in rivers, is the worst join type, and states that:
A full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.
As far as I can see, from catch_logs perspective there is no difference whether it accesses beats or rivers. The only thing I can think of which is different, is that beats also has a reference to *rivers.
I have tried creating various indices, see create table definition of catch_logs, but without any changes in my result.
How can I make an index, so that I only check 1 single row of both beats and rivers? Part questions which may point in some direction
- Why is beats only have "Using where" under "Extra", while rivers has "Using where; Using temporary; Using filesort"
- rivers uses
PRIMARY
for bothpossible_keys
andkeys
, but rivers only has this value forpossible_keys
, and null forkeys
The docs list the following explanation, but I cannot understand why it chooses to do so:
It is possible that key may name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.
Why is my possible_key "not suitable for looking up rows"?
I have improved my query somewhat by selecting min(date)
instead of using order_by date asc, limit 1
, but still my joins are creating issues in terms of performance
EDIT: Here are the table definitions:
CREATE TABLE `catch_logs` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`date` date DEFAULT NULL,
`river_id` int(10) unsigned DEFAULT NULL,
`beat_id` int(10) unsigned DEFAULT NULL,
`no_catch` tinyint(1) NOT NULL DEFAULT '0',
`removed` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `catch_logs_beat_id_foreign` (`beat_id`),
KEY `catch_logs_river_id_foreign` (`river_id`),
KEY `catch_logs_removed_no_catch_river_id_date_beat_id_index` (`removed`,`no_catch`,`river_id`,`date`,`beat_id`),
CONSTRAINT `catch_logs_beat_id_foreign` FOREIGN KEY (`beat_id`) REFERENCES `beats` (`id`) ON DELETE SET NULL,
CONSTRAINT `catch_logs_river_id_foreign` FOREIGN KEY (`river_id`) REFERENCES `rivers` (`id`) ON DELETE SET NULL,
) ENGINE=InnoDB AUTO_INCREMENT=194046 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `rivers` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`slug` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`county_id` int(10) unsigned DEFAULT NULL,
`latitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`longitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`active_on_laksebors` tinyint(1) NOT NULL DEFAULT '0',
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `rivers_county_id_foreign` (`county_id`),
CONSTRAINT `rivers_county_id_foreign` FOREIGN KEY (`county_id`) REFERENCES `counties` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=662 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `beats` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`river_id` int(10) unsigned DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`removed` tinyint(1) NOT NULL DEFAULT '0',
`latitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`longitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`active_on_laksebors` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `beats_river_id_foreign` (`river_id`),
CONSTRAINT `beats_river_id_foreign` FOREIGN KEY (`river_id`) REFERENCES `rivers` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1868 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Here is explain format=json
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "4516.29"
},
"ordering_operation": {
"using_temporary_table": true,
"using_filesort": true,
"cost_info": {
"sort_cost": "806.13"
},
"nested_loop": [
{
"table": {
"table_name": "rivers",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows_examined_per_scan": 557,
"rows_produced_per_join": 5,
"filtered": "1.00",
"cost_info": {
"read_cost": "56.89",
"eval_cost": "0.56",
"prefix_cost": "57.45",
"data_read_per_join": "28K"
},
"used_columns": [
"id",
"active_on_laksebors",
"deleted_at"
],
"attached_condition": "((`database`.`rivers`.`active_on_laksebors` = 1) and (`database`.`rivers`.`deleted_at` is null))"
}
},
{
"table": {
"table_name": "catch_logs",
"access_type": "ref",
"possible_keys": [
"catch_logs_beat_id_foreign",
"catch_logs_river_id_foreign",
"catch_logs_removed_no_catch_river_id_date_beat_id_index"
],
"key": "catch_logs_removed_no_catch_river_id_date_beat_id_index",
"used_key_parts": [
"removed",
"no_catch",
"river_id"
],
"key_length": "7",
"ref": [
"const",
"const",
"database.rivers.id"
],
"rows_examined_per_scan": 1447,
"rows_produced_per_join": 8061,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "25.15",
"eval_cost": "806.13",
"prefix_cost": "888.73",
"data_read_per_join": "24M"
},
"used_columns": [
"id",
"date",
"river_id",
"beat_id",
"no_catch",
"removed"
],
"attached_condition": "(`database`.`catch_logs`.`beat_id` is not null)"
}
},
{
"table": {
"table_name": "beats",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "4",
"ref": [
"database.catch_logs.beat_id"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 806,
"filtered": "10.00",
"cost_info": {
"read_cost": "2015.31",
"eval_cost": "80.61",
"prefix_cost": "3710.17",
"data_read_per_join": "4M"
},
"used_columns": [
"id",
"active_on_laksebors"
],
"attached_condition": "(`database`.`beats`.`active_on_laksebors` = 1)"
}
}
]
}
}
}