5

PostgreSQL allows rows to be sorted by arrays. It compares the first value of each array, then the second value and so on (fiddle):

select array[2, 4] as "array"
union
select array[10] as "array"
union
select array[2, 3, 4] as "array"
union
select array[10, 11] as "array"
order by "array"
array
[2, 3, 4]
[2, 4]
[10]
[10, 11]

The closest equivalent on MySQL and MariaDB seem to be JSON arrays.

MySQL apparently orders arrays by length more or less randomly (fiddle):

select json_array(2, 4) as `array`
union
select json_array(10) as `array`
union
select json_array(2, 3, 4) as `array`
union
select json_array(10, 11) as `array`
order by `array`
array
[10]
[2, 4]
[10, 11]
[2, 3, 4]

MariaDB somewhat orders by value but does it incorrectly (fiddle). Integers are ordered like strings (10 before 2) and arrays with the same beginning are reversed ([10, 11] before [10]):

select json_array(2, 4) as `array`
union
select json_array(10) as `array`
union
select json_array(2, 3, 4) as `array`
union
select json_array(10, 11) as `array`
order by `array`
array
[10, 11]
[10]
[2, 3, 4]
[2, 4]

Is there the a way to replicate PostgreSQL's array ordering on MySQL and/or MariaDB?

The arrays can have any length and I don't know the maximum length.

The only workaround/hack I see at the moment is concatenating the array into a string and left-padding the values with 0s to the same length: 002.004, 010.011 etc.

Salman A
  • 262,204
  • 82
  • 430
  • 521
Jonas Staudenmeir
  • 24,815
  • 6
  • 63
  • 109
  • *MySQL apparently orders arrays by length* No. https://www.db-fiddle.com/f/jS9u43bHatgg7GKsFMBfY7/3 – Akina Nov 29 '21 at 16:14

5 Answers5

3

It looks like a bug to me. According to docs

Two JSON arrays are equal if they have the same length and values in corresponding positions in the arrays are equal.

If the arrays are not equal, their order is determined by the elements in the first position where there is a difference. The array with the smaller value in that position is ordered first. If all values of the shorter array are equal to the corresponding values in the longer array, the shorter array is ordered first.

But ORDER BY looks not following such rules at all.

This is a DB fiddle for MySQL 8 & 5.7

I'm using CROSS JOIN and explicit comparison to get the expected ordering.

SELECT f.`array`, SUM(f.`array` > g.`array`) cmp
FROM jsons f
CROSS JOIN jsons g
GROUP BY f.`array`
ORDER BY cmp
;

There is another observation for MySQL 5.7, when using subquery, > is doing something like string comparison, it needs to cast to JSON again to get correct result while MySQL8 does not need to do so.

SELECT f.`array`, SUM(CAST(f.`array` AS JSON) > CAST(g.`array` AS JSON)) cmp
FROM (
 select json_array(2, 4) as `array`
 union
 select json_array(10) as `array`
 union
 select json_array(2, 3, 4) as `array`
 union
 select json_array(10, 11) as `array`
) f
CROSS JOIN (
 select json_array(2, 4) as `array`
 union
 select json_array(10) as `array`
 union
 select json_array(2, 3, 4) as `array`
 union
 select json_array(10, 11) as `array`
) g
GROUP BY f.`array`
ORDER BY cmp
;

The above does not work in MariaDB.

https://mariadb.com/kb/en/incompatibilities-and-feature-differences-between-mariadb-106-and-mysql-80/

In MySQL, JSON is compared according to json values. In MariaDB JSON strings are normal strings and compared as strings.

Query below works for MariaDB

WITH RECURSIVE jsons AS (
 select json_array(2, 4) as `array`
 union
 select json_array(10) as `array`
 union
 select json_array(2, 3, 4) as `array`
 union
 select json_array(10, 11) as `array`
),
maxlength AS (
 SELECT MAX(JSON_LENGTH(`array`)) maxlength
 FROM jsons
),
numbers AS (
 SELECT 0 AS n
 FROM maxlength
 UNION ALL
 SELECT n + 1
 FROM numbers
 JOIN maxlength ON numbers.n < maxlength.maxlength - 1
),
expanded AS (
 SELECT a.`array`, b.n, JSON_EXTRACT(a.`array`, CONCAT('$[', b.n, ']')) v
 FROM jsons a
 CROSS JOIN numbers b
),
maxpadding AS (
 SELECT MAX(LENGTH(v)) maxpadding
 FROM expanded
)
SELECT a.`array`
FROM expanded a
CROSS JOIN maxpadding b
GROUP BY a.`array`
ORDER BY GROUP_CONCAT(LPAD(a.v, b.maxpadding, '0') ORDER BY a.n ASC)
ProDec
  • 5,390
  • 1
  • 3
  • 12
  • Thanks for the documentation references. The MySQL one sounds exactly like the behavior I'm looking for. The MariaDB one explains the results I'm getting. – Jonas Staudenmeir Dec 02 '21 at 06:51
  • 1
    They've acknowledged in the documentation, on the same page, that order by is not implemented for nonscalar values. Also MariaDB most likely supports `json_table` so recursive approach is not needed. – Salman A Dec 09 '21 at 07:52
1

Using JSON_VALUE:

WITH cte AS (
  select json_array(2, 4) as `array`
  union
  select json_array(10) as `array`
  union
  select json_array(2, 3, 4) as `array`
  union
  select json_array(10, 11) as `array`
)
select *
from cte
order by CAST(JSON_VALUE(`array`, '$[0]') AS INT),
         CAST(JSON_VALUE(`array`, '$[1]') AS INT),
         CAST(JSON_VALUE(`array`, '$[2]') AS INT)
        -- ...;


-- MySQL 8.0.21+
select *
from cte
order by
 JSON_VALUE(`array`, '$[0]' RETURNING SIGNED),
 JSON_VALUE(`array`, '$[1]' RETURNING SIGNED),
 JSON_VALUE(`array`, '$[2]' RETURNING SIGNED)

db<>fiddle demo

Output:

enter image description here

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • This approach requires me to know the maximum array length, right? – Jonas Staudenmeir Dec 01 '21 at 16:48
  • @JonasStaudenmeir In general yes, but if you try to access non-existing index it will return null. ` CAST(JSON_VALUE(`array`, '$[2]') AS INT)` for json_array(10). Depending on needs you can set up to 10 leading elements. – Lukasz Szozda Dec 01 '21 at 16:50
  • Unfortunately, I don't know the maximum array length in my use case. The arrays could contain dozens or hundreds of values and the query needs to work for all of them. – Jonas Staudenmeir Dec 01 '21 at 17:03
1

The documentation currently says that:

ORDER BY and GROUP BY for JSON values works according to these principles:

[...]

  • Sorting of nonscalar values is not currently supported and a warning occurs.

JSON arrays are nonscalar values and your code does produce the following warning in MySQL 8:

Level Code Message
Warning 1235 This version of MySQL doesn't yet support 'sorting of non-scalar JSON values'

Unfortunately there is nothing you can do except wait for MySQL to implement the said functionality. Or use a hack such as this which requires MySQL 8 JSON_TABLE to split json array into rows then pad the values and group concatenate them again to create a sortable string:

select *, (
    select group_concat(lpad(jt.v, 8, '0') order by jt.i)
    from json_table(t.array, '$[*]' columns(i for ordinality, v int path '$')) as jt
) as sort_str
from t
order by sort_str

Demo on db<>fiddle

Salman A
  • 262,204
  • 82
  • 430
  • 521
1

Here's a solution that offer:

  • negative numbers support

  • float numbers support

  • avoid typing long CTE queries*

* here the advantage is when you have to type queries frequently, CTE is nevertheless still a good option

All you have to do is select * from data order by json_weight(json_column,base_value);

In order to be able to do this, create those four functions json_max, json_weight, json_maxdigits and json_pad and use them in the order by clause:

delimiter //
create or replace function json_max(j json) returns float deterministic
  begin
    declare l int;
    declare mv float;
    declare v float;
    set l = json_length(j);
    for i in 0..l-1 do
      set v = abs(json_value(j,concat('$[',i,']')));
      if (mv is null) or (v > mv) then
        set mv = v;
      end if;
    end for;
    return mv;
  end
//
create or replace function json_weight(j json, base int) returns float deterministic
  begin
    declare l int;
    declare w float;
    set w = 0;
    set l = json_length(j);
    for i in 0..l-1 do
      set w = w + pow(base,-i) * json_value(j,concat('$[',i,']'));
    end for;
    return w;
  end
//
create or replace function json_maxdigits(j json) returns int deterministic
  return length(cast(floor(abs(json_max(j))) as char(16)))
//
create or replace function json_pad(j json, digitcount int) returns varchar(512) deterministic
  begin
    declare l int;
    declare v int;
    declare w varchar(512);
    set w = '';
    set l = json_length(j);
    for i in 0..l-1 do
      set v = json_value(j,concat('$[',i,']'));
      set w = concat(w, if(v>=0,'0','-'), lpad(v, digitcount, 0));
    end for;
    return w;
  end
//
delimiter ;

Then use them as follows:

select * from (
select json_array(2, 4) as `array`
union
select json_array(10) as `array`
union
select json_array(2, 3, 4) as `array`
union
select json_array(10, 11) as `array`
) data order by json_weight(`array`,max(json_max(`array`)) over ());
-- or if you know that 11 is the max value:
--) data order by json_weight(`array`,11);
-- alternative method:
--) data order by json_pad(`array`,max(json_maxdigits(`array`)) over ());
-- alternative method and you know that only two digits are enough to represent numbers in the array:
--) data order by json_pad(`array`,2);

Explanation:

json_max gives you the max absolute value in an json_array:

select json_max('[22,33,-55]'); -- 55

json_maxdigits gives you the max quantity of digits (of the absolute number) in an json_array:

select json_maxdigits('[21,151,-4]'); -- 3

json_weight converts your json array to a float-equivalent value, where each number of the array is equivalent to a digit in a number in the base you specify as argument:

select json_weight('[1,3,5,7]', 10); -- 1.357
select json_weight('[1,0,1]', 2); -- 1.25 (like binary floats)

json_pad converts your json array to a string of zero-padded numbers, with minus signal included as an extra symbol to guarantee negative ordering (or extra symbol 0 otherwise because + is less than - in ascii order):

select json_pad('[1,-3,15,7]', 2); --'001-03015007'

You can sort your query result set using float weights or padded strings. The two options are provided because:

  • float weights lose precision when you have long json arrays, but have float support
  • padded strings have great precision, here set to 512 digits, and you can even increase this number, however they don't offer float support (anyway you didn't ask for it).

If you use float weights you have to set the base. You set it manually or use the biggest number as base, which you get by using max(json_max(column_name)) over (). If you use base values which are less than this biggest value, you risk getting inconsistent results, if you use a number that is just too high you'll lose precision.

Similarly when using padded strings to sort, you have to provide the max number of digits to be consumed by the biggest absolute value (-35 would be 2 absolute digits).

Note: those functions work on early versions of MariaDB, that still didn't support json_table function.

brunoff
  • 4,161
  • 9
  • 10
1

If you can't make assumptions on the length of the array, and you don't want to use hacks like reformatting the array into a string of padded values, then you can't do this in a single query.

The expressions in the ORDER BY clause must be fixed before the query begins reading any rows, just like other parts of the query, for instance the columns of the select-list.

But you can use a query to generate a dynamic SQL query with enough terms in the ORDER BY clause to account for the greatest length array.

Demo:

create table mytable (array json);

insert into mytable values  ('[2, 3, 4]'), ('[2, 4]'), ('[10]'), ('[10, 11]');

select max(json_length(array)) as maxlength from mytable;
+-----------+
| maxlength |
+-----------+
|         3 |
+-----------+

Then make a recursive CTE that generates integers from 0 to the max length minus 1:

with recursive array as (
    select max(json_length(array)) as maxlength from mytable
),
num as (
    select 0 as num
    union
    select num+1 from num cross join array where num < maxlength-1
)   
select num from num;
+------+
| num  |
+------+
|    0 |
|    1 |
|    2 |
+------+

These integers can be used in formatting expressions to use in the ORDER BY clause:

with recursive array as (
    select max(json_length(array)) as maxlength from mytable
),
num as (
    select 0 as num
    union
    select num+1 from num cross join array where num < maxlength-1
)
select concat('CAST(JSON_EXTRACT(array, ', quote(concat('$[', num, ']')), ') AS UNSIGNED)') AS expr from num;
+-----------------------------------------------+
| expr                                          |
+-----------------------------------------------+
| CAST(JSON_EXTRACT(array, '$[0]') AS UNSIGNED) |
| CAST(JSON_EXTRACT(array, '$[1]') AS UNSIGNED) |
| CAST(JSON_EXTRACT(array, '$[2]') AS UNSIGNED) |
+-----------------------------------------------+

Then generate an SQL query with these expressions:

with recursive array as (
    select max(json_length(array)) as maxlength from mytable
),
num as (
    select 0 as num
    union
    select num+1 from num cross join array where num < maxlength-1
),
orders as (
    select num, concat('CAST(JSON_EXTRACT(array, ', quote(concat('$[', num, ']')), ') AS UNSIGNED)') AS expr from num
)
select concat(
    'SELECT array FROM mytable\nORDER BY \n  ',
    group_concat(expr order by num separator ',\n  '),
    ';'
) as query
from orders\G

query: SELECT array FROM mytable
ORDER BY 
  CAST(JSON_EXTRACT(array, '$[0]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[1]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[2]') AS UNSIGNED);

Then finally, capture the result from that query, and execute it as a new dynamic SQL query:

SELECT array FROM mytable
ORDER BY 
  CAST(JSON_EXTRACT(array, '$[0]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[1]') AS UNSIGNED),
  CAST(JSON_EXTRACT(array, '$[2]') AS UNSIGNED);
+-----------+
| array     |
+-----------+
| [2, 3, 4] |
| [2, 4]    |
| [10]      |
| [10, 11]  |
+-----------+
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828