2

I have the following query

SELECT t.res, IF(t.res=0, "zero", "more than zero")
FROM (
SELECT table.*, IF (RAND()<=0.2,1, IF (RAND()<=0.4,2, IF (RAND()<=0.6,3,0))) AS res
FROM table LIMIT 20) t

which returns something like this:

result with LIMIT

That's exactly what you would expect. However, as soon as I remove the LIMIT 20 I receive highly unexpected results (there are more rows returned than 20, I cut it off to make it easier to read):

SELECT t.res, IF(t.res=0, "zero", "more than zero")
FROM (
SELECT table.*, IF (RAND()<=0.2,1, IF (RAND()<=0.4,2, IF (RAND()<=0.6,3,0))) AS res
FROM table) t

result without LIMIT

Side notes:
I'm using MySQL 5.7.18-15-log and this is a highly abstracted example (real query is much more difficult).
I'm trying to understand what is happening. I do not need answers that offer work arounds without any explanations why the original version is not working. Thank you.

Update: Instead of using LIMIT, GROUP BY id also works in the first case.

Update 2: As requested by zerkms, I added t.res = 0 and t.res + 1 to the second example result without LIMIT and two more columns

Horen
  • 11,184
  • 11
  • 71
  • 113
  • To reproduce, does your 'table' have exactly 20 rows, as it appears from your result set? – Steve Brandli Jan 24 '18 at 22:32
  • no, it has more rows – Horen Jan 24 '18 at 22:34
  • _Update:_ Instead of using `LIMIT`, `GROUP BY id` also works in the first case. – Horen Jan 24 '18 at 22:36
  • It's really hard to believe the second query produces that result set. – zerkms Jan 24 '18 at 22:38
  • 1
    I find it odd that your second result set, without the limit, only shows 20 rows. I grabbed a random table that I have that has 28 rows and tried your exact query, substituting my table name, and both worked as expected, with the second result containing 28 rows. – Steve Brandli Jan 24 '18 at 22:39
  • second example shows more rows. I just cut it off here to shorten my question. Sorry for the confusion. I'll add a note in the question – Horen Jan 24 '18 at 22:40
  • @SteveBrandli what MySQL version are you using? – Horen Jan 24 '18 at 22:42
  • Version 5.5.58. – Steve Brandli Jan 24 '18 at 22:43
  • Can you add 2 more columns to the second result set: `t.res = 0` and `t.res + 1`? – zerkms Jan 24 '18 at 22:44
  • @zerkms updated the question. please check. even weirder... – Horen Jan 24 '18 at 22:48
  • WAAAAAAAAAT o_O – zerkms Jan 24 '18 at 22:48
  • It looks like mysql recalculates the `t.res` value every time it sees it, to me it sounds like a bug. A possible solution (an ugly workaround): make a double nested query. `SELECT * FROM (SELECT table.*, ...)` – zerkms Jan 24 '18 at 22:49
  • I think it is a bug because I recently upgraded to 5.7.x and I didn't have that problem with 5.6.x (but it's also possible I just never noticed) – Horen Jan 24 '18 at 22:50
  • 1
    Unfortunately, there are some bugs around `rand()` introduced with some optimization in mysql 5.7. Have a look at e.g. [this example](https://stackoverflow.com/q/44336391/6248528). It looks like this is the same thing: MySQL knows `t.res` is a random value, and evaluates it again (with a different outcome) instead of using it from the subquery. `limit` creates a materlizes subquery, which is why it works as expected in teh first query (and making `limit 200000000` a a workaround) – Solarflare Jan 24 '18 at 22:51
  • Maybe try using RAND(1) instead of RAND() which returns the same random value for each call per entity. (https://www.w3resource.com/mysql/mathematical-functions/mysql-rand-function.php) – pBuch Jan 24 '18 at 22:53
  • @pBuch that does not work unfortunately – Horen Jan 24 '18 at 22:55
  • @Solarflare your link leads to the solution. The `LIMIT` workaround is also mentioned there. Thanks – Horen Jan 24 '18 at 22:56
  • @Horen how about double nesting the query? – zerkms Jan 24 '18 at 22:56
  • @zerkms I also tried that and it did not help. I think I'll go with the `GROUP BY id` solution for now – Horen Jan 24 '18 at 22:58
  • 1
    Thanks everyone, if someone wants to collect some rep, feel free to post the answer. – Horen Jan 24 '18 at 22:58

1 Answers1

2

The problem is caused by a change introduced in MySQL 5.7 on how derived tables in (sub)queries are treated.
Basically, in order to optimize performance, some subqueries are executed at different times and / or multiple times leading to unexpected results when your subquery returns non-deterministic results (like in my case with RAND()).
There are two easy (and likewise ugly) workarounds to get MySQL to "materialize" (aka return deterministic results) these subqueries: Use LIMIT <high number> or GROUP BY id both of which force MySQL to materialize the subquery and return the expected results.
The last option is turn off derived_merge in the optimizer_switch variable: derived_merge=off (make sure to leave all the other parameters as they are).

Further readings:
https://mysqlserverteam.com/derived-tables-in-mysql-5-7/
Subquery's rand() column re-evaluated for every repeated selection in MySQL 5.7/8.0 vs MySQL 5.6

Horen
  • 11,184
  • 11
  • 71
  • 113