I'm not mysql guy, but I've got opportunity to work with that recently - I've got to optimize some query on Mysql5.7 (I have to cover that also on 5.6, but started with 5.7 as it has apparently more info in explain) running on AWS Aurora. It takes a lot of time, some joins are involved, etc. I started with cutting branches and choose to start 'debugging' with two tables only. These are not too big (~2M and ~1.5M rows), but well, generally I consider them as not too greatly designed (primary keys on varchar(255) columns, etc.).
The thing is that I wanted to take a look into so I used explain format=json
and I'm trying to get any insight from that.
So, let's say that for
select cc.id, cc.col1, cc.col2, ct.col1
from my_table cc
inner join my_table ct on ct.cc_id = cc.id
I am getting something like
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "5630369.56"
},
"nested_loop": [
{
"table": {
"table_name": "cc",
"access_type": "index",
"possible_keys": [
"PRIMARY"
],
"key_length": "258",
"rows_examined_per_scan": 1248725,
"rows_produced_per_join": 1248725,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "3732979.00",
"eval_cost": "249745.00",
"prefix_cost": "3982724.00",
"data_read_per_join": "1G"
}
}
},
{
"table": {
"table_name": "ct",
"access_type": "ref",
"possible_keys": [
"cc_id_idx"
],
"key": "cc_id_idx",
"key_length": "257",
"ref": [
"cc.id"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1373037,
"filtered": "100.00",
"cost_info": {
"read_cost": "1373037.97",
"eval_cost": "274607.59",
"prefix_cost": "5630369.56",
"data_read_per_join": "3G"
},
"used_columns": [
"id",
"col_1",
"col_2",
...
]
}
}
]
}
}
and I have problems with understanding what happens here exactly. I think that mysql makes nested join where it starts with table cc
and then for each row it hits to ct
(access_type
is ref
and "rows_examined_per_scan": 1
). It makes it > 1M times ("rows_produced_per_join": 1373037
). Is that correct? I was looking for any documentation for that, but I didn't found any specifing info about how read these values in context of joining - maybe just my google-fu is not strong enough. Could any give me any clue about that?
(As I'm looking for any options to speed it up, I wanted to force mysql to make a hash join, which is available on Aurora (I was trying on Aurora 2.09) in form of /*+ HASH_JOIN(cc) */
, but it didn't affect the query plan in any case - but it's rather issue for another question.)