CTE Issues for Selecting Distinct Values and assigned NEWID() to those values

Question

I'm having an issue with some CTE code I'm working with. My current goal is that I have a table that has a 'Type' column. I want to select all the DISTINCT types from that 'Type' column and, for each type, assign a value of NEWID() to a separate column.

Heres a sample table that im starting with:

Type	NEW ID
1	NULL
1	NULL
4	NULL
4	NULL
4	NULL
MA	NULL
MA	NULL

    WITH unique_gen_id AS (
        SELECT DISTINCT type, NEWID() AS unique_id
        FROM tmp
        )

    UPDATE t
    SET t.unique_id = u.unique_id
    FROM tmp t  INNER JOIN
    unique_gen_id u ON t.type = u.type

This query almost works-- it assigns a "NEWID()" unique value to each respective "Type" with a few mishaps. (Type is not specific to either an int or character, could be anything).

Type	NEW ID
1	B280347A-C394-4656
1	B280347A-C394-4656
4	C03F0E24-7187-4CC2
4	D10415A8-55BD-4251
4	D10415A8-55BD-4251
MA	DBE92CA0-B440-484D
MA	DBE92CA0-B440-484D

As you can see, the query returned almost fine. It failed, however, with "Type" of '4' as it assigned 2 separate 'NEWIDS()' when its supposed to match all the way through.

It gets worse with different data-- I tried on different data using different 'Types' (For example, I had 100 records with Type of "1" that returned 100 unique IDs for each record when its supposed to be 1 NEWID() for all of Type "1", then a new NEWID() for a different type etc etc) and it was catastrophic.

You seem to be using SQL-Server, not MySQL. The `UPDATE` syntax is not correct for MySQL. — Barmar, Sep 09 '22 at 22:14
`DISTINCT type, NEWID()` won't do what you want. The `NEWID()` will be unique so DISTINCT certainly won't remove anything. If you want to be certain that `NEWID()` is only evaluated once per `type` materialise it into a temp table anyway. Otherwise you are subject to non deterministic and non guaranteed behaviour w.r.t. where a compute scalar gets evaluated in the execution plan. — Martin Smith, Sep 09 '22 at 22:21
Classic misuse of `DISTINCT` - nearly always using `DISTINCE` implies you're solving the problem in the wrong way. — Dale K, Sep 09 '22 at 22:21

score 0 · Accepted Answer · answered Sep 09 '22 at 22:20

The problem is that your CTE isn't returning distinct IDs. DISTINCT applies to the entire SELECT list, not just the following column. Since NEWID() returns a different ID for each row, you get duplicate types because they have different IDs.

Instead of SELECT DISTINCT, use GROUP BY type to get one row per type. Use an aggregation function such as MAX() or MIN() to pick one of the IDs.

WITH unique_gen_id AS (
    SELECT type, MAX(NEWID()) AS unique_id
    FROM tmp
    GROUP BY type
)

score 0 · Answer 2 · answered Sep 09 '22 at 22:29

You can do the update directly in the CTE with something like:

with u as (
    select *
    from t
    cross apply(
        select type, Max(NewId())
        from t t2
        where t2.type = t.type
        group by t2.type
    )n(t,nid)
)
update u set new_id = nid;

CTE Issues for Selecting Distinct Values and assigned NEWID() to those values

2 Answers2