T-SQL Dense_Rank() increments for nulls in group

Question

Good day, I am having trouble with my SQL query. I have a dataset that is representing nested objects ranging in depth from 3 items to 8 items. I am using dense_rank() over (Partition by [previous rank] to identify each item uniquely, but because of the nested nature of the data dense rank is not behaving the way I want.

Column A	Column B	Column C	Column D	Column E	Column F	Column G	Column H
System	Electric	SubSystem	DistributionSystem1	NULL	NULL	NULL	NULL
System	Electric	SubSystem	DistributionSystem2	NULL	NULL	NULL	NULL
System	Electric	SubSystem	Generators	GeneratorCategory	DieselGenerator	BackupGenerator	DieselGenerator1
System	Electric	SubSystem	Load Center	MCC	MCC1	Cubicle	Cubicle1
System	Electric	SubSystem	Load Center	MCC	MCC1	Cubicle	Cubicle2
System	Electric	SubSystem	Load Center	MCC	MCC2	Cubicle	Cubicle1
System	Electric	SubSystem	Load Center1	MCC	MCC2	Breaker	Breaker 1
System	Electric	SubSystem	Substation	SubStationCategory	SubStation1	Transformer	Transformer1
System	Cleaning	SubSystem	Cleaning Supply System	Pump	Pump 1 Cleaning Supply	NULL	NULL
System	Cleaning	SubSystem	Cleaning Supply System	Pump	Pump 1 Cleaning Supply	Motor	Motor Pump 1 Cleaning Supply
System	Cleaning	SubSystem	Cleaning Supply System	Pump	Pump 2 Cleaning Supply	NULL	NULL
System	Cleaning	SubSystem	Cleaning Supply System	Pump	Pump 2 Cleaning Supply	Motor	Motor Pump 2 Cleaning Supply
System	Cleaning	SubSystem	Cleaning Supply System	Structure	Cleaning Supply Pump Station	Meter	Flow Meter 1
System	Cleaning	SubSystem	Cleaning Waste System	Pump	Pump 1 Cleaning Waste	NULL	NULL
System	Cleaning	SubSystem	Cleaning Waste System	Pump	Pump 1 Cleaning Waste	Meter	Level Indicator Cleaning Waste
System	Cleaning	SubSystem	Cleaning Waste System	Pump	Pump 1 Cleaning Waste	Motor	Motor Pump 1 Cleaning Waste

I have tried many variations of the dense_rank() function

,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A]) AS [Column A ID]

Works fine, but since the value is always "system" that isn't all that confusing.

,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A],[Column B]) AS [Column B ID]

Also works But

,   case when [Column E] is null then NULL else DENSE_RANK() over (Partition by case when [Column E] is null then 0 end, [Column D] ORDER BY [Column E], [Column D], [Column C], [Column B], [Column A]) END AS [Column E ID]

Is where I run into problems.

Select
    [Column A]
,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A]) AS [Column A ID]
,   [Column B]
,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A],[Column B]) AS [Column B ID]
,   [Column C]
,   DENSE_RANK() over (Partition by [Column B] ORDER BY [Column A],[Column B], [Column C]) AS [Column C ID]
,   [Column D]
,   DENSE_RANK() over (Partition by [Column B]  ORDER BY [Column A],[Column B], [Column C], [Column D]) AS [Column D ID]
,   [Column E]
,   case when [Column E] is null then NULL else DENSE_RANK() over (Partition by [Column C],[Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E]) END AS [Column E ID]
,   [Column F]
,   case when [Column F] is null then NULL else DENSE_RANK() over (Partition by [Column E], [Column D], [Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F]) END AS [Column F ID]
,   [Column G]
,   case when [Column G] is null then NULL else DENSE_RANK() over (Partition by case when [Column F] is null then 0 end, [Column F], [Column C] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F], [Column G]) END AS [Column G ID]
,   [Column H]
,   case when [Column H] is null then NULL else DENSE_RANK() over (Partition by [Column G], [Column F], [Column E], [Column D],[Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F], [Column G], [Column H]) END AS [Column H ID]
From #tempDummyData
Order by [Column A], [Column B], [Column C], [Column D], [Column E], [Column F], [Column G], [Column H]

ResultsFixed

The Picture shows the results, and I've annotated in red the expected results. The idea is to be able to concatenate the numbers together to create a unique ID that tells what rank, and position each item holds. so row 1 would be 1 While row 4 would be 1.2.1.4.1.1.1.1 Is it possible to get Dense_rank() to ignore nulls, but still reset with each group?

A side note regarding your planned dot-numbering scheme. This is workable as long as non-null values will never be present after an earlier column has a null value. Otherwise, you may have ambiguities between cases like (1, 2, null) and (1, null, 2). — T N, Mar 21 '23 at 01:33
That is a good point, but yes nothing can exist without a parent so I will never have something in a column after a null. — Nathan Roberts, Mar 21 '23 at 22:54

T N · Accepted Answer · 2023-03-21T01:41:56.397

In addition to using a CASE expression suggested by Alan to suppress the rank results for null values, I expect you will also want to adjust the remaining values in the column. For DENSE_RANK(), that adjustment would be one if null values were present and zero otherwise.

No simple straightforward way of doing this comes to mind, but it can be done using another windowed SUM() function that wraps another CASE expression.

CASE WHEN ColX IS NOT NULL THEN
    DENSE_RANK()
        OVER(PARTITION BY ... ORDER BY ColX)
    - MAX(CASE WHEN ColX IS NULL THEN 1 ELSE 0 END)
        OVER(PARTITION BY ... ORDER BY ColX)
     END AS RankX

A somewhat klugy alternative that eliminates the adjustment, is to use ISNULL() to replace null values with a values that always sorts last and thus doesn't affect the lower DENSE_RANK() values.

CASE WHEN ColX IS NOT NULL THEN
    DENSE_RANK()
        OVER(PARTITION BY ... ORDER BY ISNULL(ColX, 'ZZZMaxValue'))
     END AS RankX

(ELSE NULL is implied in the above case statements when not specified.)

Sample results:

ColA	COlB	COlC	RankA	RankB	RankC
null	null	null	null	null	null
A	null	null	1	null	null
A	null	X	1	null	1
A	null	X	1	null	1
A	J	null	1	1	null
A	J	X	1	1	1
A	J	Y	1	1	2
A	J	Y	1	1	2
A	J	Z	1	1	3
A	K	X	1	2	1
A	K	Y	1	2	2
B	J	null	2	1	null
B	J	X	2	1	1
B	J	Y	2	1	2
B	K	Z	2	2	1

See this db<>fiddle for a demo containing both forms.

score 0 · Answer 2 · answered Mar 20 '23 at 22:44

Don't you need to partition by all the preceding columns? This would also mean you only have to order by the column you are evaluating

For example [Column E ID]

would be

,   CASE WHEN [Column E] IS NULL THEN NULL ELSE DENSE_RANK() over (Partition by [Column A],[Column B], [Column C], [Column D] ORDER BY  [Column E]) END AS [Column E ID]

there's also a typo I'm guessing where you have [column C id] twice, I guess this should have been [column d id]. I did not correct this for you as it would not match the image so might be confusing to others

OH thanks! that will totally clean up the code and make it easier to read. I see the Typo, I'll fix the screenshot and the text so it matches — Nathan Roberts, Mar 21 '23 at 22:58

T-SQL Dense_Rank() increments for nulls in group

2 Answers2