1

I know that a deterministic ORDER BY is when you sort by a UNIQUE INDEX column, 1) is this true?.

I want to sort by the columns name and percentage.

There is a possibility that 2 or more students have the same name and the same percentage, therefore, the ORDER BY in that case would be non-deterministic because MySQL would define which one to put first and which one to put after (not me), 2) is this true?

So, my third question is (the main one), if I want to sort by several columns but still have a deterministic order, should I use a UNIQUE INDEX column at the end of all my columns, e.g.

ORDER BY name, percentage, id?

Considering that the id is the primary key

I still don't know if I understand deterministic and non-deterministic correctly.

gus
  • 123
  • 5
  • 3
    Yes, if you add a unique column as the last column in `ORDER BY` you will get a consistent, deterministic ordering. – Barmar Jun 12 '23 at 23:58
  • 1
    You not only should add that id column to the order by clause, you should add it to the select list as well, otherwise you can't even tell two students with the same name apart :) – Shadow Jun 13 '23 at 00:03
  • hi @Barmar, so, the answer to my 3 questions is "yes", i.e. is it ok as I understand it? – gus Jun 13 '23 at 00:05
  • Almost forgot :) @Shadow. I can put *any* number of columns, right?, the important thing is that they come with a `UNIQUE INDEX` at the end in the ORDER BY, right? – gus Jun 13 '23 at 00:12
  • 1
    Expanding slightly on Barmar's comment: You can order by something without selecting it. – Bohemian Jun 13 '23 at 00:34
  • @gus it depends - see Barmar's answer below – Shadow Jun 13 '23 at 00:47
  • hi @Bohemian, does this mean that we can literally sort by *any* column of the base tables involved (the `FROM`/`JOIN`s columns)? – gus Jun 13 '23 at 01:41
  • 1
    @gus More than that, you can sort by any *expression* – Bohemian Jun 13 '23 at 03:16
  • Excellent, thanks for the complementary information, I will do some tests about it. @Bohemian – gus Jun 13 '23 at 17:27

2 Answers2

3
  1. Not necessarily. If the results don't contain any duplicates in the ORDER BY columns, the order will be deterministic.
  2. Yes, if there are duplicates the the results are non-deterministic.
  3. Yes, adding a unique column to the ORDER BY makes it deterministic.

Whether this matters depends on your application. It can be important if you're doing query-based replication and you use INSERT INTO table SELECT .... Without deterministic ordering, you may get inconsistent AUTO_INCREMENT IDs between the master and replicas.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • "*Whether this matters depends on your application.*" - Ok, I need an HTML view to display the rows of a query (**a paginator**), so I think it is necessary that the order is *deterministic* to have consistent pages, right? @Barmar – gus Jun 13 '23 at 01:38
  • 1
    Yes. Although you can still get inconsistencies if the table is changed while paging. That's what happens if when you page in the Stack Exchange "hot questions" page. – Barmar Jun 13 '23 at 01:52
  • Of course, you are right, but this is "by nature" of the changing information, right? (unless we lock the tables, but that would be another topic) @Barmar – gus Jun 13 '23 at 01:58
  • 1
    Yes, that's inherent in viewing changing data. But some applications do pagination differently -- they make a snapshot, and you page through that. In this case, you don't have to worry about consistency across queries. – Barmar Jun 13 '23 at 15:00
  • Great, now it's clear to me. I didn't know about the "snapshot", I will investigate more about it, so far marked as best answer @Barmar – gus Jun 13 '23 at 17:26
  • Hi @Barmar, if I have multiple `JOIN`s (and therefore there will be *many columns* from multiple tables), can I use a `UNIQUE INDEX` from any table that is in the `FROM/JOIN` clause? – jwa Jul 01 '23 at 12:00
  • ..or, does it have to be a `UNIQUE INDEX` of a **certain table** of the `FROM/JOIN` clause? @Barmar – jwa Jul 01 '23 at 12:01
  • 1
    Indexes are not really relevant. What matters is whether there are any duplicates in the result set. The only relevance of indexes is that when a column is unique, you know that it won't have any duplicates. But when you start joining tables, you can create duplicates of unique columns if they join with multiple rows. – Barmar Jul 01 '23 at 19:06
  • "*But when you start joining tables, you can create duplicates of unique columns if they join with multiple rows*" - Exactly @Barmar, you have understood well the objective of my question. So, as a conclusion, could it be any `UNIQUE INDEX` of any `FROM/JOIN` table, or what would be your answer as a conclusion. For example, for a paginator – jwa Jul 01 '23 at 20:54
  • ... i.e., to create a deterministic ORDER BY @Barmar – jwa Jul 01 '23 at 20:55
  • 1
    The ORDER BY should contain all the columns that guarantee a unique ordering in the result. The specifics depend on the actual query (`GROUP BY` changes things, for instance). It's hard to give a generic answer when joins are involved. – Barmar Jul 01 '23 at 21:14
  • Ok, for example, a *primary key* (e.g. `users.id`) can appear several times in the result set if it has several associated records in another table, so MySQL would not know which one to put first if it will use ONLY `ORDER BY users.id`, is that what you mean? @Barmar – jwa Jul 01 '23 at 21:48
  • Exactly. So you might do `ORDER BY table1.id, table2.date`. `table2.date` doesn't necessarily have to be a unique key, but if there's are no duplicates for each ID it's enough for a consistent ordering. – Barmar Jul 03 '23 at 16:21
0

To answer your questions:

  1. Yes
  2. Yes, but to be more concrete: It's the quick sort algorithm which mysql uses to sort the data that causes the randomness.
  3. If you need a deterministic query then yes, this is a good way.
mexok
  • 89
  • 1
  • 8