1

I have data that is structured as follows:

Parent   |    Child
散       |    ⺙
⺙       |    
⺙       |    乂
散       |    龷
散       |    月

I am using the following query to return all the ancestors up to two levels above a search term (i.e., return parents and grandparents only).

SELECT a.parent AS level3, b.parent AS level2, c.parent AS level1
FROM decomposition_dup AS a
LEFT JOIN decomposition_dup AS b ON a.parent LIKE b.child
LEFT JOIN decomposition_dup AS c ON b.parent LIKE c.child
WHERE a.child LIKE '$searchterm'

The problem is that it is quite a slow query (~5 seconds). My EXPLAIN shows:

enter image description here

I have already indexed the appropriate columns. Have I made an error? Or is there a better way to structure my query?

EDIT: Here is the table structure:

CREATE TABLE `decomposition_dup` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `parent` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 `structure` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 `child` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 KEY `parent` (`parent`),
 KEY `child` (`child`),
 KEY `parent_2` (`parent`,`child`)
) ENGINE=InnoDB AUTO_INCREMENT=211929 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Daniel
  • 433
  • 1
  • 3
  • 18
  • Tables `b` and `c` have a product number of rows which is rather large. So I'm not actually worried about your running time. – Tim Biegeleisen Jan 25 '16 at 09:48
  • @TimBiegeleisen - They are all the same table. I was under the impression that ~165,000 rows isn't a big table though? – Daniel Jan 25 '16 at 09:51
  • Well if you have to compare `150K+` records against `150K+` records, this is significant. Are you sure the indices are being used? – Tim Biegeleisen Jan 25 '16 at 09:52
  • Your column `child` doesn't contain match patterns such as `'1%'`, so why do you use `LIKE`? Are you trying to confuse the DBMS and hide your intentions? Use `ON a.parent = b.child` and `ON a.parent = c.child` instead. Does `$searchterm` contain a pattern? Otherwise use `=` there, too (`WHERE a.child = $searchterm`). – Thorsten Kettner Jan 25 '16 at 09:56
  • @ThorstenKettner You may have just hit it. Is his use of `LIKE` not using the indices? Combine that with 100K x 100K records, and this explains it. – Tim Biegeleisen Jan 25 '16 at 09:57
  • Why are there duplicates in the table? And can a child have several parents? – Thorsten Kettner Jan 25 '16 at 10:00
  • @ThorstenKettner - Sorry, fixed the table contents. No duplicates but yes a child can have multiple parents. – Daniel Jan 25 '16 at 10:07
  • @TimBiegeleisen - I use `LIKE` instead of `=` because `=` will return results for characters that are not the same but are similar. E.g., ⻐(not in Unicode) and 钅(U+9485). `LIKE` only returns the character that was searched. – Daniel Jan 25 '16 at 10:09
  • I'm not questioning your logic, I'm questioning whether your use of `LIKE` is rendering the indices not useful, hence the long running time. – Tim Biegeleisen Jan 25 '16 at 10:10
  • @TimBiegeleisen - I understand. From what I have read and from what the `EXPLAIN` output says, `LIKE` can use indices. I tried to rerun the queries with `=` instead, but it simply won't work in this case so I cannot compare. – Daniel Jan 25 '16 at 10:17
  • @Daniel Galletta: It is strange that `=` behaves like this. I'd consider this a bug. Maybe there is some setting influencing this? LIKE is made to look for patterns, e.g. `b.parent LIKE '%1%' would look for all parent containing a 1. The index *can* be used, but it is very likely not to be used efficiently. And why do your parent/child IDs contain strange charcters that need special treatment? Your sample shows integers only, and this is what I would expect in your scenario. – Thorsten Kettner Jan 25 '16 at 10:22
  • My general advice: You say you have an issue with `=`. Don't use `LIKE` to circumvent this. LIKE is not made for this. Instead make a new request asking people why `=` is broken in your MySQL version and how to fix it. – Thorsten Kettner Jan 25 '16 at 10:24
  • @ThorstenKettner - Thank you that is good advice. I replaced the Chinese characters with numbers because I thought it would be easier for people to understand. I've updated the data in my question. – Daniel Jan 25 '16 at 10:25
  • Ah, I see, your table shows how the Chinese characters are composed. Unfortunately all I can tell you is `LIKE` is probably slowing down your query, but I cannot tell you how to fix it. I hope you'll get the answer in a separate request. – Thorsten Kettner Jan 25 '16 at 10:36
  • Please provide `SHOW CREATE TABLE`. You apparently don't have the _appropriate_ indexes. Are you using `CHARACTER SET utf8mb4`; you should be. – Rick James Jan 25 '16 at 18:05
  • `LIKE` is very definitely slowing the query down -- the Optimizer cannot tell whether you have wild cards, so it scans the entire index instead of going for the one (or few) relevent rows. – Rick James Jan 25 '16 at 18:07
  • @RickJames - I have edited my question to include the table structure. The table does use `CHARACTER SET utf8m4b`. – Daniel Jan 25 '16 at 22:57
  • Table looks ok. Did changing `LIKE` to `=` solve the performance problem (and change the `EXPLAIN`)? – Rick James Jan 25 '16 at 23:52
  • @RickJames - Unfortunately I cannot simply change `LIKE` to `=` because of the problem I outlined in this question: http://stackoverflow.com/questions/35004672/mysql-returns-incorrect-utf8-extended-characters-in-some-cases-only – Daniel Jan 25 '16 at 23:55

2 Answers2

0

Your query is fine. It uses the appropriate indexes for b and c (USING INDEX) but may however be improved by adding a covering index on

(`a.`.parent, a.`child`)

My conjecture is that the projection phase takes a long time, not the computation time (because there are lots of records to output). You can check by yourself with some mysql profiling.


About LIKE vs = behavior 'problem', I ran into the same issue myself recently and you may enjoy reading further about it there.

Community
  • 1
  • 1
Sebas
  • 21,192
  • 9
  • 55
  • 109
0
  1. LIKE to =
  2. Set COLLATE utf8mb4_unicode_520_ci on parent and child if you want 'similar' characters to be treated equal. Or use COLLATE utf8mb4_general_ci for them to be unequal. Example of such
Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222