5

I'm playing with MySQL 8's new JSON capabilities, in particular multi-value indexes.

I noticed that there are 2 ways to check if a JSON array contains a given value: the MEMBER OF() operator, and the JSON_CONTAINS() function.

They both return the same set of results for every query I make, but surprisingly, MEMBER OF seems to be 3x faster than JSON_CONTAINS.

Example on a table with 200,000 records, with around 700,000 values total in the catIds fields:

CREATE TABLE test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  catIds JSON NOT NULL,
  PRIMARY KEY (id),
  KEY test_categories ((CAST(catIds AS UNSIGNED ARRAY)))
);

INSERT INTO test (catIds) VALUES('[123, 456]');
...

... some 200,000 inserted records later:

mysql> SELECT count(*) FROM test WHERE 51 member of (catIds);
+----------+
| count(*) |
+----------+
|     7287 |
+----------+
1 row in set (0.11 sec)

mysql> SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, '51');
+----------+
| count(*) |
+----------+
|     7287 |
+----------+
1 row in set (0.36 sec)

If first thought that this was because the stringified JSON value '51' was maybe converted on each iteration, so I tried assigning it to a variable first; but this does not make it any faster:

mysql> SET @value = CAST(51 as JSON);
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, @value);
+----------+
| count(*) |
+----------+
|     7287 |
+----------+
1 row in set (0.38 sec)

Unless I'm mistaken, MEMBER OF() and JSON_CONTAINS() are equivalent feature-wise. In this case, why is one of them faster than the other?

BenMorel
  • 34,448
  • 50
  • 182
  • 322

2 Answers2

8

JSON_CONTAINS() does more complex work than MEMBER OF.

JSON_CONTAINS() must parse its second argument, a candidate JSON document that you're searching for within the stored JSON document.

The candidate may not be a simple scalar as you're searching for in your example above. It could be a more complex document with its own nested arrays and objects.

Therefore comparing the candidate to the stored document must potentially compare in a more complex manner, not just searching for a single scalar value, but comparing recursively, all nested elements.

Even if your example search is for a simple scalar value, it still invokes the same code path that might need to search for a complex document. Based on your timing measurement, that code path seems to have more overhead.

Whereas MEMBER OF only searches for a scalar value, and only searches in an array. It can also optimize by using a cached, pre-sorted array.

See https://github.com/mysql/mysql-server/blob/8.0/sql/item_json_func.cc#L3852 for the code.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

The reason is even though it's using the same index, It also matters, how you used that particular index. There are two types of index usage (given by type column or access_type in json format output of EXPLAIN command)

There are two classes of index search in my opinion (or Understanding)

  1. const type / ref type (directly lookup for a key in B+ tree)
  2. range type (find keys in range on index)

when you are using MEMBER OF function it uses ref type for index search, when you use JSON_OVERLAPS it uses range type index search (will do >= and <= key set search and then do intersection), which is obviously slow, this can be verified through EXPLAIN command output.

Output for EXPLAIN SELECT count(*) FROM test WHERE 51 member of (catIds);, you observe that it's using index called test_categories, but using type column value is ref (lookup for 51 in secondary index B+tree, then lookup for the corresponding primary keys). enter image description here

Where as you can observe the output for EXPLAIN SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, '51');, where you can see that type column value is range (find keys in range [51, 51], do some preprocessing and then lookup for these primary keys on clustered indices, then possible de-duping logic on them)

enter image description here

yugandhar
  • 580
  • 7
  • 16