2

Suppose I have this table :

+------------------------------------+
|         T_BOULEVERSEMENT           |
+---------------------+--------------+
| PK_A_BOULEVERSEMENT | I_OCCURRENCE |
+---------------------+--------------+
|         1           |      3       |
+---------------------+--------------+
|         2           |      5       |
+---------------------+--------------+
|         3           |      1       |
+---------------------+--------------+
|        ...          |     ...      |
+---------------------+--------------+
|         X           |      Y       |
+---------------------+--------------+

And I want to return the first row in which the sum of all the previous occurrences (I_OCCURRENCE) is greater than a random value.

The random value is comprised in the range [1 - SUM(I_OCCURRENCE)].

The following statement seems to work fine.

SELECT y.`PK_A_BOULEVERSEMENT`,
       y.`I_OCCURRENCE`
FROM (SELECT t.`PK_A_BOULEVERSEMENT`,
             t.`I_OCCURRENCE`,
             (SELECT SUM(x.`I_OCCURRENCE`)
              FROM   `T_BOULEVERSEMENT` x
              WHERE  x.`PK_A_BOULEVERSEMENT` <= t.`PK_A_BOULEVERSEMENT`) AS running_total
      FROM   `T_BOULEVERSEMENT` t
      ORDER BY t.`PK_A_BOULEVERSEMENT`) y
WHERE y.running_total >= ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1)
ORDER BY y.`PK_A_BOULEVERSEMENT`
LIMIT 1

But in really it mainly returns rows where PK_A_BOULEVERSEMENT is less than 10.

However, if I execute the following statement :

SELECT ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1)

The result seems to be uniform in the range [1 - SUM(I_OCCURRENCE)].

What can be wrong ?

Thanks

EDIT :

SQL Fiddle : http://sqlfiddle.com/#!2/b37d6/2

The desired result must be uniform in the range 1 - MAX(PK_A_BOULEVERSEMENT)

Loïc G.
  • 3,087
  • 3
  • 24
  • 36
  • Consider providing proper DDLs and/or and sqlfiddle TOGETHER WITH THE DESIRED RESULT (based upon a given variable) – Strawberry Dec 31 '13 at 15:07

2 Answers2

3

try this:

SET @random_sum = (SELECT ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1));

SELECT y.PK_A_BOULEVERSEMENT, SUM(x.I_OCCURRENCE) AS tot_occurence
    FROM T_BOULEVERSEMENT AS x, T_BOULEVERSEMENT AS y
    WHERE x.PK_A_BOULEVERSEMENT <= y.PK_A_BOULEVERSEMENT
    GROUP BY y.PK_A_BOULEVERSEMENT
    HAVING tot_occurence <= @random_sum

I had to use a temporary variable because mysql seems to recalculate rand() every row when using it in a where clause (so every row is compared to a different value).

With temporary variable I evaluate random number just before executing the query.

arilia
  • 9,373
  • 2
  • 20
  • 44
1

The cause of your problem is that the random number is being regenerated for each row in the subquery. Chances are that within the first 10 rows, you'll get a random number that's less than that row's running total. If we add the RAND() call and look at the subquery, it will look like this:

PK_A_.. I_OCC.. RUNNING_TOTAL   RNDM
      1     3          3          58
      2     1          4          30
      3     3          7          38
      4     1          8          33
      5     3         11          53
      6     3         14          40
      7     3         17          37
      8     3         20           1
      9     3         23          21
     10     1         24          39
     11     3         27           3
     12     1         28          23

We only have to go as far as row 8 to find a running_total that exceeds the random value. The solution is to get the random value once, as suggested in the other answer.

Joe
  • 6,767
  • 1
  • 16
  • 29