0

I have the following MySql table (only 845 rows):

CREATE TABLE `categories_nested_set` (
  `lft` int(11) NOT NULL DEFAULT '0',
  `rgt` int(11) DEFAULT NULL,
  `id` int(11) DEFAULT NULL,
  `category` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`lft`),
  UNIQUE KEY `id` (`id`),
  UNIQUE KEY `rgt` (`rgt`),
  KEY `idx_lftrgtid` (`id`,`lft`,`rgt`),
  KEY `idx_lft` (`lft`),
  KEY `i1` (`lft`) USING BTREE,
  KEY `i2` (`rgt`) USING BTREE,
  KEY `i3` (`id`) USING BTREE,
  CONSTRAINT `fk_categories_nested_set_id_category` FOREIGN KEY (`id`) REFERENCES `categories` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

(you can see I have a whole lot of indexes in there, just in case).

I perform the following self-join query:

SELECT * 
FROM categories_nested_set      AS H
LEFT JOIN categories_nested_set AS I ON (H.lft > I.lft)

Which generates the following EXPLAIN:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,H,ALL,NULL,NULL,NULL,NULL,845,NULL
1,SIMPLE,I,ALL,"PRIMARY,idx_lft,i1",NULL,NULL,NULL,845,"Range checked for each record (index map: 0x31)"

enter image description here

The EXPLAIN would suggest that MySql is choosing not to use an index, and I cannot understand why. The table definition shows that all the relevant columns are indexed.

In the scope of a far-larger query (5 million rows, 14x tables) this piece is proving to be a massive bottleneck. Any advice would be appreciated.

Thanks,

Rick James
  • 135,179
  • 13
  • 127
  • 222
mils
  • 1,878
  • 2
  • 21
  • 42
  • Does the dataset on which you ran `EXPLAIN` really only have 845 rows? MySQL won't necessarily use an index, if, for example, the data set is sufficiently small. You should get performance stats on the actual larger query. – Tim Biegeleisen May 23 '17 at 04:13
  • Yes, only 845 rows. Different explain on the larger query, obviously far more elaborate, but nonetheless a Full Table Scan. – mils May 23 '17 at 04:27
  • Well you need to do at least one full table scan, for the table on the left side of the join. – Tim Biegeleisen May 23 '17 at 04:27
  • But it does two – mils May 23 '17 at 04:46
  • The left-right trick is nifty. But it belongs in a textbook, not in the real world where performance matters. – Rick James May 23 '17 at 05:38
  • @RickJames what alternative do you suggest for finding "parent categories" recursively? As mentioned below, the real query is a classic nested set query: `LEFT JOIN categories_nested_set AS H ON H.id = C.id_category LEFT JOIN categories_nested_set AS I ON (H.lft BETWEEN I.lft AND I.rgt)` but I wanted to remove any confusion regarding performance of BETWEEN. Thanks – mils May 23 '17 at 05:48
  • `BETWEEN col1 AND col2` can't be optimized. I added a tag. Search for tags `[hierarchical-data] [mysql]` to get some more ideas on a hierarchy. – Rick James May 23 '17 at 15:00

2 Answers2

0

I think you should use this query :

SELECT * FROM categories_nested_set AS H , categories_nested_set AS I where (H.lft > I.lft);

The idea is not to use JOIN because it forces MySql to construct the result by matching one row at a time.

In the solution I propose, the cross-product table is constructed without a join, so it enables InnoDB to fetch rows from the first table (H) independently of the other table, hence allowing it to batch rows. The index on the second table can then be used by MySQL because no link is made from the H table.

On my computer, the proposed solution is about 5 times faster, with a table of 1000 records.

Here is the result of the explain :

EXPLAIN SELECT * FROM categories_nested_set AS H , categories_nested_set AS I where (H.lft > I.lft);

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE H ALL PRIMARY,idx_lft,i1 NULL NULL NULL 921 
1 SIMPLE I ALL PRIMARY,idx_lft,i1 NULL NULL NULL 921 Using where; Using join buffer

Note that you may also improve index use of your request by restricting the columns you are retrieving (although that is not always the case).

Arnaud Tournier
  • 984
  • 11
  • 11
  • However just by curiosity because this query generates a lot of combination, what do you reallly try to extract from your nested set tree structure with this query ? From what I understand it's all the pairs of nodes when one is after the other, but I don't get what you will do with that... THanks – Arnaud Tournier May 23 '17 at 04:53
  • Hi Arnaud, I'm still reading over your answer, but re your question, the real query is a classic nested set query: `LEFT JOIN categories_nested_set AS H ON H.id = C.id_category LEFT JOIN categories_nested_set AS I ON (H.lft BETWEEN I.lft AND I.rgt)` but I wanted to remove any confusion regarding performance of BETWEEN – mils May 23 '17 at 04:57
  • Just ran your query, same problem, here is the EXPLAIN: `id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra 1,SIMPLE,H,ALL,"PRIMARY,idx_lft,i1",NULL,NULL,NULL,845,NULL 1,SIMPLE,I,ALL,"PRIMARY,idx_lft,i1",NULL,NULL,NULL,845,"Range checked for each record (index map: 0x31)"` – mils May 23 '17 at 04:59
  • Just for your information : I tested the queries on 10000 records (49 millions result rows) and the join-free query is a lot lot faster. – Arnaud Tournier May 23 '17 at 04:59
  • Are you sure you did run the exact same query ? Especially the query is not executed in the same way if you extract different fields... If you are sure and get different results the difference might come from the environments. Maybe better to make a SQL script which creates the table from zero and run it inside a docker container with a precise version of mysql. – Arnaud Tournier May 23 '17 at 05:04
0

The LEFT JOIN match all fields in the table. Try to use INNER JOIN instead of it.