25

I want to update 50% of the rows in a table, randomly selected. Is there any way to do that?

Edit: Just to clarify that it should always update 50% of the records, but of those 50% the rows must be randomly selected (not only the top 50% for instance). In other words, in avarage, every other record should be updated.

James
  • 311
  • 1
  • 4
  • 10

4 Answers4

47

Should work like that:

UPDATE table SET x = y WHERE RAND() < 0.5

Yep, tested it, works. But of course, it is only 50% of the rows on average, not exactly 50%.

As written in the SQL 92 specification, the WHERE clause must be executed for each tuple, so the rand() must be reevaluated yielding the intended result (instead of either selecting all or no rows at all).

Excerpt from the specification (emphasis mine):

General Rules

1) The <search condition> is applied to each row of T. The result of the <where clause> is a table of those rows of T for which the result of the <search condition> is true.

2) Each <subquery> in the <search condition> is effectively executed for each row of T and the results used in the application of the <search condition> to the given row of T. If any executed <subquery> contains an outer reference to a column of T, then the reference is to the value of that column in the given row of T.

gexicide
  • 38,535
  • 21
  • 92
  • 152
  • 1
    Surely that would update *all* rows half the time and none at all for the other half? – Widor Jun 18 '12 at 16:33
  • 2
    And there's another problem: even if it works as planned, it might not update any rows - or update all the rows. RAND() is random, you know. ) – raina77ow Jun 18 '12 at 16:34
  • If that does work, it only works on MySql (which is what the OP is using, I know) - definitely only `UPDATE`s all-or-nothing on MSSql. – Widor Jun 18 '12 at 16:38
  • well, it is still correct. You wrote "In other words, in avarage, every other record should be updated.". This is what is done here, ON AVERAGE :). Use raina77ow's answer if you want exactly 50%. However, the performance of his solution might not be too good for large tables (and due to the law of large numbers, my condition should get closer to 50% the bigger the table is). – gexicide Jun 18 '12 at 16:42
  • 1
    @Widor: Looked it up in the spec, quoted in in my answer. The behaviour I mentioned is correct. If Mssql compiles this query to a all-or-nothing execution, then Mssql does not implement the standard correctly (wouldn't be the first MS product that doesn't care about standards, cf. IE and W3C standards...) – gexicide Jun 18 '12 at 16:50
  • @Widor In fact it would work on other DBMSs as well. Of the popular products, only MSSQL doesn’t evaluate for every row. – Manngo Nov 27 '20 at 04:35
7

As I said, that's a long way, described in a sort of pseudocode. )

$x = SELECT COUNT(*) FROM some_table;
@ids = SELECT id FROM some_table ORDER BY RAND() LIMIT $x / 2;
UPDATE some_table WHERE id IN (@ids);
raina77ow
  • 103,633
  • 15
  • 192
  • 229
  • 2
    You can express that with one query. Then you can use standard SQL and without weird pseudocode extensions. – gexicide Oct 17 '12 at 09:30
0

UPDATE table SET volumnvalue = x WHERE RAND() <= 0.5 will result in very near to 50% of the records

0

RAND should be random and you will not get a solid percentile split.

It would be better to use the modulus operator % to find every X number of items. This does work best with unique id columns like a Primary Key.

Try running this query, be sure to specify your table name and id column name:

Selecting every 2nd row, divisible by 2 SELECT * from <your_table_name> where <id_column_name> %2=0

Selecting every 6th row, divisible by 6 SELECT * from <your_table_name> where <id_column_name> %6=0

Once you hare happy that the SELECT results look good, you can change the query with update syntax to update the records, using the same WHERE clause

Rudi Strydom
  • 4,417
  • 5
  • 21
  • 30