Group by absorb NULL unless it's the only value

Question

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.

CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);

I'm trying to get output like

Foo OtherKeyId
A   NULL
B   1
C   1
C   2

This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.

I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work

-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
    SELECT *,
    ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
    FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1

This is for a modern SQL Server: Microsoft SQL Server 2019

score 1 · Answer 1 · answered Nov 06 '20 at 17:59

1

Hmmm . . . I think you want filtering:

select t.*
from #tempx1 t
where t.otherkeyid is not null or
      not exists (select 1
                  from #tempx1 t2
                  where t2.foo = t.foo and t2.otherkeyid is not null
                 );

answered Nov 06 '20 at 17:59

Gordon Linoff

1,242,037
58
646
786

score 1 · Accepted Answer · answered Nov 06 '20 at 18:11

You can use a GROUP BY expression with HAVING clause like below one

SELECT [Foo],[OtherKeyId]
  FROM #tempx1 t
 GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
    OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1

Demo

score 0 · Answer 3 · answered Nov 06 '20 at 18:55

My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:

SELECT
    a.[Foo],
    b.[OtherKeyId]
FROM (
    SELECT
        [Foo],
        COUNT([OtherKeyId]) [C]
    FROM #tempx1 t
    GROUP BY [Foo]
) a
JOIN (
    SELECT
        [Foo],
        [OtherKeyId]
    FROM #tempx1 t
    GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
    (b.[OtherKeyId] IS NULL AND a.[C] = 0)
    OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Group by absorb NULL unless it's the only value

3 Answers3