2

I have the following table with a multi-value index set up on a JSON integer array:

CREATE TABLE test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  catIds JSON NOT NULL,
  PRIMARY KEY (id),
  KEY test_categories ((CAST(catIds AS UNSIGNED ARRAY)))
);

I've inserted 200,000 records such as:

INSERT INTO test (catIds) VALUES('[123, 456]');
...

The issue is, querying this table on the catIds field with or without the index does not change the execution speed. I've tried querying with both MEMBER OF() and JSON_CONTAINS(), with and without the index; the speeds are the same.

And indeed, EXPLAIN shows that these queries do not use the index:

mysql> EXPLAIN SELECT count(*) FROM test WHERE 51 MEMBER OF (catIds);
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 201416 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+----------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                              |
+-------+------+----------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `count(*)` from `test`.`test` where <cache>(51) member of (`test`.`test`.`catIds`) |
+-------+------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, '51');
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 201416 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+---------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                   |
+-------+------+---------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `count(*)` from `test`.`test` where json_contains(`test`.`test`.`catIds`,<cache>('51')) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Why is the index on catIds not used for these queries? What did I miss?

BenMorel
  • 34,448
  • 50
  • 182
  • 322

2 Answers2

3

You must use a JSON path for your index definition and the predicate in your query.

https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued says:

The only type of expression that is permitted in a multi-valued key part is a JSON path. The path need not point to an existing element in a JSON document inserted into the indexed column, but must itself be syntactically valid.

I tested this:

mysql> alter table test add key bk1 ((cast(catIds->'$[*]' as unsigned array)));
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain SELECT count(*) FROM test WHERE 903 MEMBER OF (catIds->'$[*]');
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | bk1           | bk1  | 9       | const |    8 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+

I have no doubt that using this feature will increase the WTFs per minute during code reviews.

Also keep in mind that MySQL will skip using an index if the optimizer thinks it won't help. Like if the table only has a few rows, or if the value you are searching for occurs in a majority of the rows. This is not specific to the multi-valued index, it has been part of MySQL's optimizer behavior with normal indexes for many years.

Here's an example: I have 4096 rows in my table, but they're all the same. Even if I search for a value that occurs in the table, MySQL detects that it would match a majority of rows (all rows, in this case) and avoids the index.

mysql> select distinct catIds from test;
+--------------+
| catIds       |
+--------------+
| [258.0, 7.0] |
+--------------+
1 row in set (0.00 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|     4096 |
+----------+
1 row in set (0.01 sec)

mysql> explain SELECT count(*) FROM test WHERE 258 MEMBER OF (catIds);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 4096 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    Thank you. I can see the index used now. The index makes the query return a bit faster when querying `WHERE 51 MEMBER OF (catIds->'$[*]')` (0.19s vs 0.25s without the index), however I'm surprised that querying `WHERE 51 MEMBER OF (catIds)`, while still not using the index, returns in 0.10s, so twice as fast? How is the index not helping in this situation? Also surprising is that in this particular case, this query with no index is actually faster than normalizing the data and using a separate table for the `catIds` and `JOIN`ing it (roughly on par with the array index on the JSON field). – BenMorel Jan 01 '20 at 22:19
  • I don't know. `MEMBER OF` is a brand new feature, so it could have some maturation to do. I suggest you file a bug at bugs.mysql.com and get them to comment. I also suggest that there is no good way to use JSON in a `WHERE` clause. – Bill Karwin Jan 16 '20 at 16:30
  • 1
    Thank you. Sure, I'm not saying that this design is better than using a normalized database, but JSON has been introduced for a reason; I feel like it can simplify a few designs where an extra table may look overkill, and/or the schema needs more flexibility. Anyway, it's satisfying to play with this shiny new feature! – BenMorel Jan 16 '20 at 20:40
  • Okay, but if you're just playing, then don't expect it to be efficient. Many forms of play are deliberately _inefficient_. – Bill Karwin Jan 16 '20 at 22:05
  • 1
    When I mean playing, I mean use it in production somewhere on a small project, and see how it goes. So performance is always sought after! – BenMorel Jan 16 '20 at 23:28
1

There are few of reasons why multi-valued indexes in MySQL are so slow: *) those are always secondary indexes and thus have penalty compared to primary indexes in InnoDB *) same as regular index it points to the record, which then contains JSON which has to be unpacked in order to be processed. *) unlike regular indexes, multi-valued indexes can't be covering, they always have to fetch row from the table

All this narrows conditions where multi-valued indexes are beneficial. Best conditions for them are: *) high selectivity of the index, the higher the better *) lots of rows in table *) big json docs - this doesn't make index scan faster, but rather plain scans are slow due to out of row blob storage in innodb and thus allows index scan to be shiner

e exo
  • 11
  • 1