2

I have a MySQL 5.7 database with a JSON column containing an array of strings.

I need to remove a varying number of those strings from the array, by value, in one query.

Example JSON:

["1-1-2", "1-1-3", "1-2-2", "1-2-3", "1-1-16", "1-1-17"]

I may need to remove "1-1-16" and "1-1-17" so I end up with:

["1-1-2", "1-1-3", "1-2-2", "1-2-3"]

At other times, I may need to remove just one value, or several more, in one query.

JSON_REMOVE() can accept multiple path arguments, but the problem is that when multiple paths are specified, the result of JSON_REMOVE() is passed sequentially left to right on each path, which makes it very difficult to use the result of JSON_SEARCH() for each passed path.

For example, this does not work, because the 2nd JSON_SEARCH will return the incorrect index for '1-1-17' after '1-1-16' has been removed:

UPDATE json_meta
                SET document = 
                JSON_REMOVE( document, 
                    JSON_UNQUOTE(JSON_SEARCH(document, 'one', '1-1-16')),               
                    JSON_UNQUOTE(JSON_SEARCH(document, 'one', '1-1-17')),               
                )
WHERE id=10

You need to do this instead:

UPDATE json_meta
                SET document = 
                JSON_REMOVE( document, 
                    JSON_UNQUOTE(JSON_SEARCH(document, 'one', '1-1-16')),               
                    JSON_UNQUOTE(JSON_SEARCH(JSON_REMOVE( document, 
                    JSON_UNQUOTE(JSON_SEARCH(document, 'one', '1-1-16'))), 'one', '1-1-17'))
                )
WHERE id=10

The query grows exponentially complex with each additional string that needs to be removed.

I'm wondering if the best solution would just be to use a chained REPLACE() with all permutations of commas on each string (i.e. each string with a comma before, a comma after, a comma both before and after).

Final note: I found another question with the exact same issue described here. However, that question doesn't have an accepted answer, and the one answer there is very complex. That answer indicates that MySQL 5.6 doesn't have much JSON support; I am wondering, since I'm using MySQL 5.7, is there a simpler solution possible?

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
jessica
  • 3,051
  • 2
  • 30
  • 31
  • 2
    Just one more reason not to use MySQL JSON, IMHO. This is trivial if you normalize your tables. – Barmar Jul 21 '21 at 21:17
  • Yes, the _simple_ solution in MySQL 5.6 and MySQL 5.7 and MySQL 8.0 is the same: store multi-valued attributes as one value per row. Then you can delete two rows by their value, not their position. It doesn't matter which one is "first." – Bill Karwin Jul 21 '21 at 21:33
  • you can use a stored procdure where you split up a strind and run a loop and delete one for one – nbk Jul 21 '21 at 21:35
  • @Barmar I agree, but not in a position to rearchitect this database right now. :) – jessica Jul 21 '21 at 21:51
  • Best might be to use a stored procedure that removes them in a loop, instead of trying to do it in one query. – Barmar Jul 21 '21 at 21:55
  • After posting, I realized I was overthinking it, and I can safely remove the individual elements in separate queries instead of just one, so I will just do that. Accepted the best answer below, however. – jessica Jul 21 '21 at 22:20

1 Answers1

1

If you can ensure you search for the items in the right order, you can nest them:

SELECT JSON_REMOVE(
  JSON_REMOVE(
    document,
    JSON_UNQUOTE(JSON_SEARCH(document, 'one', '1-1-17'))
  ), 
  JSON_UNQUOTE(JSON_SEARCH(document, 'one', '1-1-16') )
) AS j
FROM json_meta;
+--------------------------------------+
| j                                    |
+--------------------------------------+
| ["1-1-2", "1-1-3", "1-2-2", "1-2-3"] |
+--------------------------------------+

But this can't be done in a single query without knowing the order.

You can do it in two queries, one to get the paths and then pick which path to use at each nesting level:

mysql> select json_unquote(json_search(document, 'one', '1-1-17')),
              json_unquote(json_search(document, 'one', '1-1-16'))
       into @path17, @path16 from json_meta;

mysql> select json_remove(json_remove(document, greatest(@path16, @path17)), least(@path16, @path17)) as j from json_meta;
+--------------------------------------+
| j                                    |
+--------------------------------------+
| ["1-1-2", "1-1-3", "1-2-2", "1-2-3"] |
+--------------------------------------+

If you had three or more things to remove, you'd have to sort the paths yourself and build the query in the right order.

Another solution is to fetch the whole document into an application where you have access to a JSON library to explode the elements into an array. Then you can eliminate array elements, re-marshal the array back into JSON and update the database.

If you upgrade to MySQL 8.0, you could use JSON_TABLE() to explode the array, filter out the elements you don't want, then implode them back into an array with JSON_ARRAYAGG().

This is all sounding more and more complex. JSON generally makes SQL queries harder, not easier, if you want to access individual elements of an array or fields of an object using SQL.

Demo:

mysql> select j.* from json_meta cross join json_table(document, '$[*]' columns (value varchar(10) path '$')) as j;
+--------+
| value  |
+--------+
| 1-1-2  |
| 1-1-3  |
| 1-2-2  |
| 1-2-3  |
| 1-1-16 |
| 1-1-17 |
+--------+

mysql> select j.* from json_meta cross join json_table(document, '$[*]' columns (value varchar(10) path '$')) as j where value not in ('1-1-16', '1-1-17');
+-------+
| value |
+-------+
| 1-1-2 |
| 1-1-3 |
| 1-2-2 |
| 1-2-3 |
+-------+

mysql> select json_arrayagg(value) as document from json_meta cross join json_table(document, '$[*]' columns (value varchar(10) path '$')) as j where value not in ('1-1-16', '1-1-17');
+--------------------------------------+
| document                             |
+--------------------------------------+
| ["1-1-2", "1-1-3", "1-2-2", "1-2-3"] |
+--------------------------------------+

The best choice for simplicity and efficiency, as well as ease of code development, is to store multi-valued attributes in a normalized manner. Then you can write the SQL query this way:

DELETE FROM MyAttribute WHERE entity_id = 10 AND value IN ('1-1-16', '1-1-17');
Yves M.
  • 29,855
  • 23
  • 108
  • 144
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • > Another solution is to fetch the whole document into an application where you have access to a JSON library to explode the elements into an array. Then you can eliminate array elements, re-marshal the array back into JSON and update the database. This is what we do now, and twice a concurrent request has added to the array while the elimination work is being done. When we "re-marshal" it back in, we lose that change.. that's the bug I'm actually trying to fix. Will accept your answer, however, as I think it's comprehensive and accurate. Thanks! – jessica Jul 21 '21 at 22:18
  • Sounds like you have a race condition. You should use some kind of centralized locking mechanism to make sure multiple clients are not working on the same record. For example, I have used MySQL's [GET_LOCK()](https://dev.mysql.com/doc/refman/5.7/en/locking-functions.html#function_get-lock) to do this. – Bill Karwin Jul 21 '21 at 23:03
  • exactly, I had a race condition. I have fixed it by making the updates purely within MySQL, and making separate queries for each value. I believe that should work, so even if someone adds values during the loop, nothing will be lost. – jessica Jul 22 '21 at 18:23