1

Are Postgres SELECT DISTINCT queries deterministic?

Will SELECT DISTINCT somecolumn FROM sometable return the same result (including order) if the table (and entire database) goes unchanged?

In the Select Query Documentation the Description section notes:

If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.

In the DISTINCT ON clause section they add:

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

Generally, is this still true when the database goes un-changed?

DannyDannyDanny
  • 838
  • 9
  • 26
  • The **only** (really: the _only_) way to get a deterministic and stable order of the rows is `ORDER BY`. There is no exception. –  Dec 19 '19 at 13:27
  • Since you mentioned *return the same result (**including order**)* the answer is no, there is no guarantee if the results will be same (**including order**) for two executions of the query. – Salman A Dec 19 '19 at 13:32

1 Answers1

3

This answer assumes that the expressions in the select are deterministic. Otherwise, the question seems trivial.

The ordering is not specified, so it could change between runs of the query -- or on a different system. However, the result set should be the same.

Your second quote from the documentation is for distinct on. That is not-deterministic, unless you are using a stable sort.

Note: You might get non-deterministic results if you are using a case-insensitive collation. The built-in collations are case-sensitive; and case insensitivity means that the original expressions are not deterministic.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • What about `select distinct ci_collation_column from...`? Can it return ABC sometimes, and abc sometimes? – jarlh Dec 19 '19 at 13:09
  • What's the reason that it *could change*? Would it use some randomness? – DannyDannyDanny Dec 19 '19 at 13:17
  • @DannyDannyDanny . . . `distinct on` is not related to your question. But when you use `order by` and the keys are the same, the database ordering can put the rows with the same key value in any order. This can occur because of innate randomness in the sorting (eg. quicksort algorithms), because of locks on rows/pages, because of timing in a multi-processor environment, and no doubt due to many other reasons. – Gordon Linoff Dec 19 '19 at 13:24
  • 1
    @DannyDannyDanny: `distinct` is usually done using a hashing algorithm which completely messes up any "natural" order of the values –  Dec 19 '19 at 13:29
  • @a_horse_with_no_name . . . But distinct applies to *all* columns, so the results are deterministic. I suppose you could be calling a non-deterministic function in an expression, which would mess things up. But for a reasonable interpretation of the question. – Gordon Linoff Dec 19 '19 at 13:34
  • They might look "deterministic" but I don't think you can rely on that unless an `order by` is used. –  Dec 19 '19 at 13:39
  • @a_horse_with_no_name fine that the order isn't natural, but surely the hashing function must be deterministic. Why would the order change from run to run if the universe being hashed does not change? – DannyDannyDanny Dec 19 '19 at 13:47
  • @a_horse_with_no_name, what about case insensitive columns, like 'ABC' vs. 'abc'? Ever/always deterministic? – jarlh Dec 19 '19 at 14:10