1

I'm working with an ndb cluster setup that uses proxysql. There are 4 mysql servers, 4 data nodes, and 2 management nodes. The following happens when I access one of the mysql servers directly, so I think that I can safely rule out proxysql as the root cause, but beyond that I'm just lost.

Here's a table I set up to help illustrate my problem:

mysql> describe delain;
+----------+-------------+------+-----+---------+----------------+
| Field    | Type        | Null | Key | Default | Extra          |
+----------+-------------+------+-----+---------+----------------+
| album_id | tinyint(2)  | NO   | PRI | NULL    | auto_increment |
| album    | varchar(30) | YES  |     | NULL    |                |
+----------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

It contains the following data; note that I specified an order by clause:

mysql> select * from delain order by album_id;
+----------+-------------------------+
| album_id | album                   |
+----------+-------------------------+
|        1 | Lucidity                |
|        2 | April Rain              |
|        3 | We Are the Others       |
|        4 | The Human Contradiction |
|        5 | Moonbathers             |
+----------+-------------------------+
5 rows in set (0.00 sec)

If I don't specify an order clause, the results returned are seemingly random, such as this:

mysql> select * from delain;
+----------+-------------------------+
| album_id | album                   |
+----------+-------------------------+
|        3 | We Are the Others       |
|        5 | Moonbathers             |
|        1 | Lucidity                |
|        2 | April Rain              |
|        4 | The Human Contradiction |
+----------+-------------------------+
5 rows in set (0.00 sec)

When I repeat the query (sans order clause) I get a different ordering pretty much every time. It doesn't seem to be truly random, but there sure as heck isn't any sort of discernible pattern to me.

Why is this happening? My experience with mysql has always been that the default ordering is essentially according to the primary key, but this is also the first time I've used an ndb cluster in particular; I don't know if there's a difference there, or if there's a setting inside a config file that got missed or what. Any help is greatly appreciated!

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
loungehead
  • 55
  • 5
  • 2
    As you just realize In every database rows aren't ordered datasets unless you specify the order. Order can vary because index, cluster, disk partition etc. Is just the database collecting the requested data the most efficient way he can. And because you didnt set `ORDER BY` he doesnt make the effort to order it and can get different result each time. – Juan Carlos Oropeza Sep 20 '17 at 19:47
  • That makes sense, and certainly matches other information I've seen when trying to figure out the solution to my problem, it just seemed counter-intuitive based on my prior experience. It must have just been confirmation bias, a difference in environment, or something similar, and I guess I can't rely on finding a broader, one-size-fits-all solution. Thank you very much for your very prompt response! – loungehead Sep 20 '17 at 20:00
  • As you didn't specify an `ORDER BY` clause, it shouldn't bother you in which order you are getting the result. If you want an order, use an `ORDER BY` clause, as this is the only way to guarantee a sorted result. Once queries get executed in parallel threads, it becomes very typical for result orders to vary with each execution for queries without an `ORDER BY`. – Thorsten Kettner Sep 20 '17 at 20:18

2 Answers2

1

This is standard SQL behavior.

https://mariadb.com/kb/en/library/sql-99/order-by-clause/ says in part:

An ORDER BY clause may optionally appear after a query expression: it specifies the order rows should have when returned from that query (if you omit the clause, your DBMS will return the rows in some random order).

(emphasis mine)

It'd be more accurate to say it will return the rows in some arbitrary order, instead of random order. Random implies that the order will change from one execution to the next.

  • In the case of InnoDB, the order tends to be the index order in which the rows were accessed. The index it reads is not necessarily the primary key. So the order is unchanging and somewhat predictable if you know something about the internals. But it's not random.

  • In the case of MyISAM, the order tends to be the order the rows are stored in the table, which can vary depending on the order the rows were inserted, and also depending on where there was space in the file at the time of insertion, after row deletions.

  • In the case of NDB, I don't know as much about its internals, so I can't describe its rule for "default" order, but it's still true that without an explicit ORDER BY, the storage engine is allowed to return rows in whatever order it wants to.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I think I can pretty confidently say that up until we spun up this cluster, I had always used InnoDB, which, based on what you said in your first bullet point, would explain why this has thrown me for such a loop. Well, if that's the way things are, then so be it -- my code will adapt to the environment, and not the other way around. – loungehead Sep 20 '17 at 20:04
0

For NDB the order depends on timing in the case of a SELECT * from table;

SELECT * from table is implemented as a parallelised full table scan within the data nodes and their database threads with one MySQL thread receiving results.

So with a filtered query like SELECT * from table where filter_column = 2; the filter gets evaluated in many threads in parallel. Each of those threads return rows to the MySQL thread in any order that depends on OS scheduler, networking and many other things. So there is no default ordering unless you use ORDER BY.

So for NDB order is truly random and not just arbitrary. You'll see this in all NDB test suites using MTR that queries mostly use SELECT * from table ORDER BY some_field;