Optimizing a slow MySQL query with self-joins

Question

I have data that is structured as follows:

Parent   |    Child
散       |    ⺙
⺙       |    
⺙       |    乂
散       |    龷
散       |    月

I am using the following query to return all the ancestors up to two levels above a search term (i.e., return parents and grandparents only).

SELECT a.parent AS level3, b.parent AS level2, c.parent AS level1
FROM decomposition_dup AS a
LEFT JOIN decomposition_dup AS b ON a.parent LIKE b.child
LEFT JOIN decomposition_dup AS c ON b.parent LIKE c.child
WHERE a.child LIKE '$searchterm'

The problem is that it is quite a slow query (~5 seconds). My EXPLAIN shows:

I have already indexed the appropriate columns. Have I made an error? Or is there a better way to structure my query?

EDIT: Here is the table structure:

CREATE TABLE `decomposition_dup` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `parent` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 `structure` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 `child` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 KEY `parent` (`parent`),
 KEY `child` (`child`),
 KEY `parent_2` (`parent`,`child`)
) ENGINE=InnoDB AUTO_INCREMENT=211929 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Tables `b` and `c` have a product number of rows which is rather large. So I'm not actually worried about your running time. — Tim Biegeleisen, Jan 25 '16 at 09:48
@TimBiegeleisen - They are all the same table. I was under the impression that ~165,000 rows isn't a big table though? — Daniel, Jan 25 '16 at 09:51
Well if you have to compare `150K+` records against `150K+` records, this is significant. Are you sure the indices are being used? — Tim Biegeleisen, Jan 25 '16 at 09:52
Your column `child` doesn't contain match patterns such as `'1%'`, so why do you use `LIKE`? Are you trying to confuse the DBMS and hide your intentions? Use `ON a.parent = b.child` and `ON a.parent = c.child` instead. Does `$searchterm` contain a pattern? Otherwise use `=` there, too (`WHERE a.child = $searchterm`). — Thorsten Kettner, Jan 25 '16 at 09:56
@ThorstenKettner You may have just hit it. Is his use of `LIKE` not using the indices? Combine that with 100K x 100K records, and this explains it. — Tim Biegeleisen, Jan 25 '16 at 09:57
Why are there duplicates in the table? And can a child have several parents? — Thorsten Kettner, Jan 25 '16 at 10:00
@ThorstenKettner - Sorry, fixed the table contents. No duplicates but yes a child can have multiple parents. — Daniel, Jan 25 '16 at 10:07
@TimBiegeleisen - I use `LIKE` instead of `=` because `=` will return results for characters that are not the same but are similar. E.g., ⻐(not in Unicode) and 钅(U+9485). `LIKE` only returns the character that was searched. — Daniel, Jan 25 '16 at 10:09
I'm not questioning your logic, I'm questioning whether your use of `LIKE` is rendering the indices not useful, hence the long running time. — Tim Biegeleisen, Jan 25 '16 at 10:10
@TimBiegeleisen - I understand. From what I have read and from what the `EXPLAIN` output says, `LIKE` can use indices. I tried to rerun the queries with `=` instead, but it simply won't work in this case so I cannot compare. — Daniel, Jan 25 '16 at 10:17
@Daniel Galletta: It is strange that `=` behaves like this. I'd consider this a bug. Maybe there is some setting influencing this? LIKE is made to look for patterns, e.g. `b.parent LIKE '%1%' would look for all parent containing a 1. The index *can* be used, but it is very likely not to be used efficiently. And why do your parent/child IDs contain strange charcters that need special treatment? Your sample shows integers only, and this is what I would expect in your scenario. — Thorsten Kettner, Jan 25 '16 at 10:22
My general advice: You say you have an issue with `=`. Don't use `LIKE` to circumvent this. LIKE is not made for this. Instead make a new request asking people why `=` is broken in your MySQL version and how to fix it. — Thorsten Kettner, Jan 25 '16 at 10:24
@ThorstenKettner - Thank you that is good advice. I replaced the Chinese characters with numbers because I thought it would be easier for people to understand. I've updated the data in my question. — Daniel, Jan 25 '16 at 10:25
Ah, I see, your table shows how the Chinese characters are composed. Unfortunately all I can tell you is `LIKE` is probably slowing down your query, but I cannot tell you how to fix it. I hope you'll get the answer in a separate request. — Thorsten Kettner, Jan 25 '16 at 10:36
Please provide `SHOW CREATE TABLE`. You apparently don't have the _appropriate_ indexes. Are you using `CHARACTER SET utf8mb4`; you should be. — Rick James, Jan 25 '16 at 18:05
`LIKE` is very definitely slowing the query down -- the Optimizer cannot tell whether you have wild cards, so it scans the entire index instead of going for the one (or few) relevent rows. — Rick James, Jan 25 '16 at 18:07
@RickJames - I have edited my question to include the table structure. The table does use `CHARACTER SET utf8m4b`. — Daniel, Jan 25 '16 at 22:57
Table looks ok. Did changing `LIKE` to `=` solve the performance problem (and change the `EXPLAIN`)? — Rick James, Jan 25 '16 at 23:52
@RickJames - Unfortunately I cannot simply change `LIKE` to `=` because of the problem I outlined in this question: http://stackoverflow.com/questions/35004672/mysql-returns-incorrect-utf8-extended-characters-in-some-cases-only — Daniel, Jan 25 '16 at 23:55

score 0 · Answer 1 · edited May 23 '17 at 12:23

Your query is fine. It uses the appropriate indexes for b and c (USING INDEX) but may however be improved by adding a covering index on

(`a.`.parent, a.`child`)

My conjecture is that the projection phase takes a long time, not the computation time (because there are lots of records to output). You can check by yourself with some mysql profiling.

About LIKE vs = behavior 'problem', I ran into the same issue myself recently and you may enjoy reading further about it there.

score 0 · Accepted Answer · edited May 23 '17 at 11:59

0

LIKE to =
Set COLLATE utf8mb4_unicode_520_ci on parent and child if you want 'similar' characters to be treated equal. Or use COLLATE utf8mb4_general_ci for them to be unequal. Example of such

edited May 23 '17 at 11:59

Community

1
1

answered Jan 26 '16 at 00:40

Rick James

135,179
13
127
222

Optimizing a slow MySQL query with self-joins

2 Answers2