19

I need to load some test data into the Channel field on my Account table. The Channel can be one of 10 different values, so I thought I'd randomly assign the Channel one of the values using a CASE expression along with ABS(CHECKSUM(NewId())) % 10 like so:

SELECT 
    id,
    name,
    Channel = 
      CASE ABS(CHECKSUM(NewId())) % 10
        WHEN 0 THEN 'Baby Only'
        WHEN 1 THEN 'Club'
        WHEN 2 THEN 'Drug'
        WHEN 3 THEN 'Food'
        WHEN 4 THEN 'Internet'
        WHEN 5 THEN 'Liquidators'
        WHEN 6 THEN 'Mass'
        WHEN 7 THEN 'Military'
        WHEN 8 THEN 'Other'
        WHEN 9 THEN 'Speciality'
        ELSE '*NONE*'            -- How is this ever getting reached?
      END
FROM 
    retailshelf_nil...account A

Since I'm using modulo 10 I thought the only possible values should be 0-9. But when I run the above code, I'm finding that the ELSE clause is indeed being reached and that my data is coming up with 'NONE' on some records as shown:

id                  name    Channel
001L000000KpgFqIAJ  Acct1   *NONE*
001L000000KpgFrIAJ  Acct2   Mass
001L000000KpgFsIAJ  Acct3   Club
001L000000KpgFtIAJ  Acct4   *NONE*
001L000000KpgFuIAJ  Acct5   Baby Only
001L000000KpgFvIAJ  Acct6   *NONE*
001L000000KpgFwIAJ  Acct7   Mass

Can someone please explain what logical error I've made that's allowing the ELSE clause to be reached?

When I run a simple test to just generate the random number like so:

SELECT
    RadomNum = ABS(CHECKSUM(NewId())) % 10
FROM 
    retailshelf_nil...account A
ORDER BY 
    1

All the numbers generated are from 0-9 as expected, so what's different about this first SQL?

And is there a workaround to ensure that ELSE is never reached?

Decoded
  • 159
  • 1
  • 10
PaulStock
  • 11,053
  • 9
  • 49
  • 52
  • 2
    When `ABS(CHECKSUM(NewId()))` returns NULL, ELSE will execute. This will happen when NewId() returns NULL. – DwB Feb 20 '14 at 20:16
  • 2
    @DwB, thanks for your response, but why would NEWID() ever return NULL? – PaulStock Feb 20 '14 at 20:17
  • If you are just looking for a random value and you are getting values coming up other than else, why not just roll else up into one of your other values, so else would be 'Specialty' as well as 9, sure this could mess up the deviation, but that may not matter. –  Feb 20 '14 at 20:28
  • @Phaeze thanks for the suggestion, and I did consider doing just what you suggested, but I am still perplexed as to why the `CASE` statement is behaving this way and would like to know the answer regardless of how I end up solving my particular problem. – PaulStock Feb 20 '14 at 20:32
  • 7
    [There is an answer that explains exactly why this happens](http://stackoverflow.com/a/21918828/61305) - `NEWID()` is re-evaluated in every branch of the CASE expression, and your calculation could just so happen to be `1 0 0 0 0 0 0 0 0 0`, for example, for some rows, which leads to the else. – Aaron Bertrand Feb 20 '14 at 20:34
  • @PaulStock I dont know how or why newid() would return null. Thus a comment instead of answer. – DwB Feb 20 '14 at 20:34
  • 2
    @PaulStock my line of thinking is wrong. newid() should never return null. – DwB Feb 20 '14 at 20:39
  • [Blogged about this here](http://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression). – Aaron Bertrand Jun 12 '14 at 15:55

3 Answers3

46

The written form of the query is expanded to:

Channel = 
      CASE
        WHEN ABS(CHECKSUM(NewId())) % 10 = 0 THEN 'Baby Only'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 1 THEN 'Club'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 2 THEN 'Drug'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 3 THEN 'Food'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 4 THEN 'Internet'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 5 THEN 'Liquidators'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 6 THEN 'Mass'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 7 THEN 'Military'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 8 THEN 'Other'
        WHEN ABS(CHECKSUM(NewId())) % 10 = 9 THEN 'Speciality'
        ELSE '*NONE*'            -- How is this ever getting reached?
      END

A new value for NEWID is used in each test.

  • 2
    That explains it then – Lamak Feb 20 '14 at 20:29
  • I knew it had to be something fundamental related to the way the `CASE` statement works that I was messing up. – PaulStock Feb 20 '14 at 20:42
  • Well, this is quite a revelation, but... why? (Never mind, I'm just being rhetorical.) – Andriy M Feb 20 '14 at 20:42
  • So, the correct thing to do to then would be to call NewID() just once, store the resulting value in a variable and use such variable for the comparison, right? Quick question: is this "functions get evaluated multiple times" behaviour defined and/or explained somewhere? Does this happen with other functions besides those for IDs? – Joe Pineda Feb 21 '14 at 00:20
  • 4
    Knowing what we know now, it's interesting that the probability `*NONE*` should be reached is (1-0.1)^10. Or around 35% of the time. I would not have guessed that. – Michael J Swart Feb 23 '14 at 18:22
  • 2
    Yes, the probability is around `1/e` where `e` is the famous constant. A related problem is: "if you have 10 hats randomly distributed to 10 persons, what is the probability that no one gets his own hat?" – ypercubeᵀᴹ Mar 27 '14 at 20:09
5

A new "random" number will be calculated for every WHEN clause - you can instead use a derived table:

SELECT ID, Name, 
    Channel = 
      CASE Rand
        WHEN 0 THEN 'Baby Only'
        WHEN 1 THEN 'Club'
        WHEN 2 THEN 'Drug'
        WHEN 3 THEN 'Food'
        WHEN 4 THEN 'Internet'
        WHEN 5 THEN 'Liquidators'
        WHEN 6 THEN 'Mass'
        WHEN 7 THEN 'Military'
        WHEN 8 THEN 'Other'
        WHEN 9 THEN 'Speciality'
        ELSE '*NONE*'            -- How is this ever getting reached?
      END
FROM
(   SELECT 
         id,
         name,
         ABS(CHECKSUM(NewId())) % 10 Rand
    FROM   
        retailshelf_nil...account A
) zzz;

or a CROSS APPLY subquery:

SELECT A.ID, A.Name, 
    Channel = 
      CASE zzz.Rand
        WHEN 0 THEN 'Baby Only'
        WHEN 1 THEN 'Club'
        WHEN 2 THEN 'Drug'
        WHEN 3 THEN 'Food'
        WHEN 4 THEN 'Internet'
        WHEN 5 THEN 'Liquidators'
        WHEN 6 THEN 'Mass'
        WHEN 7 THEN 'Military'
        WHEN 8 THEN 'Other'
        WHEN 9 THEN 'Speciality'
        ELSE '*NONE*'            -- How is this ever getting reached?
      END
FROM
    retailshelf_nil...account A
CROSS APPLY
(   SELECT 
        ABS(CHECKSUM(NewId())) % 10
) zzz (Rand);

That way NewID() is called only once per record.

A similar scneario was resolved here.

The T-SQL documentation explains this phenomenon (granted it's for Sybase but apparently still applies to SQL Server):

Expressions that reference the rand function, the getdate function, and so on, produce different values each time they are evaluated. This can yield unexpected results when you use these expressions in certain case expressions. For example, the SQL standard specifies that case expressions with the form:

case expression
    when value1 then result1
    when value2 then result2
    when value3 then result3
...
end

are equivalent to the following form of case expression:

case expression
    when expression=value1 then result1
    when expression=value2 then result2
    when expression=value3 then result3
...
end
Andriy M
  • 76,112
  • 17
  • 94
  • 154
D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • This is not true. This only applies to RAND() which is treated as a numeric constant. Even if there was only one call to NEWID(), it would end up with the same value. What the user is experiencing is a value not in (0 .. 9). – CRAFTY DBA Feb 20 '14 at 20:36
  • 1
    @CRAFTYDBA: And which value would it be? Have you tried running that CASE? – Andriy M Feb 20 '14 at 20:40
  • 1
    @CRAFTYDBA I have replicated the results and validated the fix - the documentation I found indicates that it applies to _any_ operation, not just `RAND()`. – D Stanley Feb 20 '14 at 20:44
  • 1
    @CRAFTYDBA Also a non-null value MOD 10 _cannot_ product a number other than 0-9. – D Stanley Feb 20 '14 at 20:47
  • 2
    @CRAFTYDBA The issue is that while the `RAND()` would only be run once for the entire query, `NEWID()` is run for every test case in the case expression when it gets expanded. Thus, the `CASE` in the original question runs 10 instances of `NEWID()` for each row in the table. It's very possible that no generated UID will satisfy the condition of the case statement. The problem with `RAND()` is that it doesn't run often enough. The problem with `NEWID()` is that it runs too often. – Bacon Bits Feb 20 '14 at 20:50
  • @BaconBits, if you replace NEWID() with equivalent expression using RAND() you will run into the same issue. It is just the nature of CASE that requires that the case expression is run for each when condition. See [here](http://social.msdn.microsoft.com/Forums/sqlserver/en-US/3f430d6d-ebe6-4219-9c3e-27414a38130c/a-strange-query-behaviour-with-rand?forum=transactsql) – kristof Aug 12 '14 at 17:01
0

Related to second question,

CHECKSUM(NewId()) will return negative results sometimes, which is not match with any of the case conditions. If a negative number is divided with any number the result will be negative. Execute the following query,

declare @v nvarchar(50) = newid()
select CHECKSUM(@v),@v,CHECKSUM(@v) % 10
Bharadwaj
  • 2,535
  • 1
  • 22
  • 35