10

I often see people answer MySQL questions with queries like this:

SELECT DAY(date), other columns
FROM table
GROUP BY DAY(date);

SELECT somecolumn, COUNT(*)
FROM table
HAVING COUNT(*) > 1;

I always like to give the column an alias and refer to that in the GROUP BY or HAVING clause, e.g.

SELECT DAY(date) AS day, other columns
FROM table
GROUP BY day;

SELECT somecolumn, COUNT(*) AS c
FROM table
HAVING c > 1;

Is MySQL smart enough to notice that the expressions in the later clauses are the same as in SELECT, and only do it once? I'm not sure how to test this -- EXPLAIN doesn't show any difference, but it doesn't seem to show how it's doing the grouping or filtering in the first place; it seems mainly useful for optimizing joins and WHERE clauses.

I tend to be pessimistic about MySQL optimization, so I like to give it all the help I can.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Just in case you need an immediate answer, as a stop-gap until I can find an authoritative reference (and I suspect that it might have to be from the source), I'm pretty sure that the parser recognises calls to deterministic functions (and indeed, expressions) and caches the result for reuse in a query. – eggyal May 23 '14 at 08:36
  • +1 A coherent and useful question! –  May 23 '14 at 08:37
  • @eggyal I don't need anything immediately, it's more curiosity. I wonder whether I should recommend people improve their answers when I see this redundancy. I plan on continuing to write my queries the second way because I find them more readable. – Barmar May 23 '14 at 08:38
  • 1
    Barmar asking a question? I fear my compass has been irreversibly decalibrated. – Strawberry May 23 '14 at 08:40
  • This is actually [a dupe](http://stackoverflow.com/q/16152533) (and no doubt others - I'm pretty sure I've addressed this question before, albeit perhaps as a sub-part of another question), but I'm not convinced by the answers in that linked question and think this offers a better opportunity to produce a canonical reference. – eggyal May 23 '14 at 08:42
  • 2
    One simple demonstration of the fact that function results are cached is [`SELECT RAND() FROM table ORDER BY RAND()`](http://sqlfiddle.com/#!2/ba206/1/0) - since the resulting column is indeed sorted, MySQL must have used the same value for both calls to `RAND()`. – eggyal May 23 '14 at 09:33
  • Good one! I wonder if we can come up with a similar test for `GROUP BY`. – Barmar May 23 '14 at 09:36
  • Not sure how best to demonstrate that, but given `GROUP BY` is implemented by performing a sort... – eggyal May 23 '14 at 09:42
  • 1
    Am trying to put together a slightly more authoritative answer based on the optimizer source code - just thought that was worth mentioning for now. – eggyal May 23 '14 at 09:44
  • I highly doubt rand() is cached. What if you have two rand() functions? I just tried select rand(),rand() and the both cols contain different values – Oliver M Grech Sep 16 '21 at 19:37

2 Answers2

8

I think this can be tested using sleep() function,
for example take a look at this demo: http://sqlfiddle.com/#!2/0bc1b/1

Select * FROM t;

| X |
|---|
| 1 |
| 2 |
| 2 |

SELECT x+sleep(1)
FROM t
GROUP BY x+sleep(1);

SELECT x+sleep(1) As name
FROM t
GROUP BY name;

Execution times of both queries are about 3000 ms ( 3 seconds ).
There are 3 records in the table, and for each record the query sleeps for 1 second only,
so it means that the expression is evaluated only once for each record, not twice.

krokodilko
  • 35,300
  • 7
  • 55
  • 79
  • Further 'proof' is when you change _one_ of the sleeps to (2). Now it takes 9 seconds. – Rick James Nov 01 '16 at 22:21
  • `select x+sleep(1), count(*) from t group by x+sleep(1)` takes 6 seconds. What happens? Maybe the original example is just transformed to `select distinct x+sleep(1) from t`. – Paul Spiegel Nov 03 '16 at 02:07
  • Additionally: `select x+sleep(1) as c from t having c > 0` also takes 6 seconds. So i guess, every alias is internally replaced by the expression behind it and evaluated every time again. – Paul Spiegel Nov 03 '16 at 02:30
  • But... `SLEEP()` is not really "deterministic". `SQRT` is; `RAND` is not -- hence it would be 'ok' for `SQRT` to be remembered, but `RAND` may not. OTOH, `NOW()` is deliberately remembered, but `SYSDATE` is not. – Rick James Nov 03 '18 at 03:51
1

After consulting with one of the MySQL engineers, I proffer this lengthy answer.

  • Caching - no part of a query is 'remembered' for later use in that (or subsequent) query. (Contrast: the Query cache.)
  • Common subexpression elimination - no. This is a common Compiler technique, but MySQL does not use it. Example: (a-b)*(a-b) will do the subtract twice.
  • Removal of a constant from a loop - yes, with limitations. This is another Compiler technique.
  • A variety of SQL-centric hacks - yes; see below.
  • Re-evaluation of a subquery - it depends. Also, the Optimizer is gradually getting better.
  • VIEWs - it depends. There are still cases where a VIEW is destined to perform worse than the equivalent SELECT. Example: no condition pushdown into a UNION in a VIEW. Actually, this is more a matter of delayed action.
  • I think that some newer versions of MariaDB have a "subquery cache".

(Caveat: I do not have 100% confidence in any of my answer, but I do believe that most of it is correct, as of MySQL 5.7, MariaDB 10.1, etc)

Think of a multi-row SELECT as a loop. Many, maybe all, "deterministic" expressions are evaluated once. Example: Constant date expressions, even involving function calls. But...

NOW() is specifically evaluated once at the beginning of a query. Furthermore, the value is passed to Slaves when replicating. That is, by the time the query is stored on a slave, NOW() could be out of date. (SYSDATE() is another animal.)

Especially with the advent of only_full_group_by, GROUP BY needs to know if it matches the SELECT expressions. So, this looks for similar code.

HAVING and ORDER BY can use aliases from the SELECT list (unlike WHERE and GROUP BY). So SELECT expr AS x ... HAVING expr seems to reevaluate expr, but SELECT expr AS x ... HAVING x seems to reach for the already-evaluated expr.

The Windowing functions of MariaDB 10.2 have some pretty severe restrictions on where they can/cannot be reused; I don't have a complete picture of them yet.

Generally, none of this matters -- the re-evaluation of an expression (DATE(date) or even COUNT(*)) will get the same answer. Furthermore, the rummaging through the rows is usually much more costly than expression evaluation. So, unless you have a good stopwatch, you won't tell the difference.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Considering the demonstration in the other answer, does this mean that it considers `SLEEP(1)` to be a deterministic expression, so it evaluates it only once? – Barmar Nov 03 '16 at 00:53
  • Deterministic - no. Else the query would have taken 1 sec, not 3. I think `x+sleep(1)` comes under the weasel words I spoke on `GROUP BY`. Note that `GROUP BY x+sleep(2)` takes 9 seconds; I don't know if `x` gets involved in the `only_full_group_by` checking. – Rick James Nov 03 '16 at 01:07