-1

Good day, I am having trouble with my SQL query. I have a dataset that is representing nested objects ranging in depth from 3 items to 8 items. I am using dense_rank() over (Partition by [previous rank] to identify each item uniquely, but because of the nested nature of the data dense rank is not behaving the way I want.

Column A Column B Column C Column D Column E Column F Column G Column H
System Electric SubSystem DistributionSystem1 NULL NULL NULL NULL
System Electric SubSystem DistributionSystem2 NULL NULL NULL NULL
System Electric SubSystem Generators GeneratorCategory DieselGenerator BackupGenerator DieselGenerator1
System Electric SubSystem Load Center MCC MCC1 Cubicle Cubicle1
System Electric SubSystem Load Center MCC MCC1 Cubicle Cubicle2
System Electric SubSystem Load Center MCC MCC2 Cubicle Cubicle1
System Electric SubSystem Load Center1 MCC MCC2 Breaker Breaker 1
System Electric SubSystem Substation SubStationCategory SubStation1 Transformer Transformer1
System Cleaning SubSystem Cleaning Supply System Pump Pump 1 Cleaning Supply NULL NULL
System Cleaning SubSystem Cleaning Supply System Pump Pump 1 Cleaning Supply Motor Motor Pump 1 Cleaning Supply
System Cleaning SubSystem Cleaning Supply System Pump Pump 2 Cleaning Supply NULL NULL
System Cleaning SubSystem Cleaning Supply System Pump Pump 2 Cleaning Supply Motor Motor Pump 2 Cleaning Supply
System Cleaning SubSystem Cleaning Supply System Structure Cleaning Supply Pump Station Meter Flow Meter 1
System Cleaning SubSystem Cleaning Waste System Pump Pump 1 Cleaning Waste NULL NULL
System Cleaning SubSystem Cleaning Waste System Pump Pump 1 Cleaning Waste Meter Level Indicator Cleaning Waste
System Cleaning SubSystem Cleaning Waste System Pump Pump 1 Cleaning Waste Motor Motor Pump 1 Cleaning Waste

I have tried many variations of the dense_rank() function

,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A]) AS [Column A ID]

Works fine, but since the value is always "system" that isn't all that confusing.

,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A],[Column B]) AS [Column B ID]

Also works But

,   case when [Column E] is null then NULL else DENSE_RANK() over (Partition by case when [Column E] is null then 0 end, [Column D] ORDER BY [Column E], [Column D], [Column C], [Column B], [Column A]) END AS [Column E ID]

Is where I run into problems.

Select
    [Column A]
,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A]) AS [Column A ID]
,   [Column B]
,   DENSE_RANK() over (Partition by [Column A] ORDER BY [Column A],[Column B]) AS [Column B ID]
,   [Column C]
,   DENSE_RANK() over (Partition by [Column B] ORDER BY [Column A],[Column B], [Column C]) AS [Column C ID]
,   [Column D]
,   DENSE_RANK() over (Partition by [Column B]  ORDER BY [Column A],[Column B], [Column C], [Column D]) AS [Column D ID]
,   [Column E]
,   case when [Column E] is null then NULL else DENSE_RANK() over (Partition by [Column C],[Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E]) END AS [Column E ID]
,   [Column F]
,   case when [Column F] is null then NULL else DENSE_RANK() over (Partition by [Column E], [Column D], [Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F]) END AS [Column F ID]
,   [Column G]
,   case when [Column G] is null then NULL else DENSE_RANK() over (Partition by case when [Column F] is null then 0 end, [Column F], [Column C] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F], [Column G]) END AS [Column G ID]
,   [Column H]
,   case when [Column H] is null then NULL else DENSE_RANK() over (Partition by [Column G], [Column F], [Column E], [Column D],[Column B] ORDER BY [Column A],[Column B], [Column C], [Column D], [Column E], [Column F], [Column G], [Column H]) END AS [Column H ID]
From #tempDummyData
Order by [Column A], [Column B], [Column C], [Column D], [Column E], [Column F], [Column G], [Column H]

ResultsFixed

The Picture shows the results, and I've annotated in red the expected results. The idea is to be able to concatenate the numbers together to create a unique ID that tells what rank, and position each item holds. so row 1 would be 1 While row 4 would be 1.2.1.4.1.1.1.1 Is it possible to get Dense_rank() to ignore nulls, but still reset with each group?

  • A side note regarding your planned dot-numbering scheme. This is workable as long as non-null values will never be present after an earlier column has a null value. Otherwise, you may have ambiguities between cases like (1, 2, null) and (1, null, 2). – T N Mar 21 '23 at 01:33
  • That is a good point, but yes nothing can exist without a parent so I will never have something in a column after a null. – Nathan Roberts Mar 21 '23 at 22:54

2 Answers2

1

In addition to using a CASE expression suggested by Alan to suppress the rank results for null values, I expect you will also want to adjust the remaining values in the column. For DENSE_RANK(), that adjustment would be one if null values were present and zero otherwise.

No simple straightforward way of doing this comes to mind, but it can be done using another windowed SUM() function that wraps another CASE expression.

CASE WHEN ColX IS NOT NULL THEN
    DENSE_RANK()
        OVER(PARTITION BY ... ORDER BY ColX)
    - MAX(CASE WHEN ColX IS NULL THEN 1 ELSE 0 END)
        OVER(PARTITION BY ... ORDER BY ColX)
     END AS RankX

A somewhat klugy alternative that eliminates the adjustment, is to use ISNULL() to replace null values with a values that always sorts last and thus doesn't affect the lower DENSE_RANK() values.

CASE WHEN ColX IS NOT NULL THEN
    DENSE_RANK()
        OVER(PARTITION BY ... ORDER BY ISNULL(ColX, 'ZZZMaxValue'))
     END AS RankX

(ELSE NULL is implied in the above case statements when not specified.)

Sample results:

ColA COlB COlC RankA RankB RankC
null null null null null null
A null null 1 null null
A null X 1 null 1
A null X 1 null 1
A J null 1 1 null
A J X 1 1 1
A J Y 1 1 2
A J Y 1 1 2
A J Z 1 1 3
A K X 1 2 1
A K Y 1 2 2
B J null 2 1 null
B J X 2 1 1
B J Y 2 1 2
B K Z 2 2 1

See this db<>fiddle for a demo containing both forms.

T N
  • 4,322
  • 1
  • 5
  • 18
0

Don't you need to partition by all the preceding columns? This would also mean you only have to order by the column you are evaluating

For example [Column E ID]

would be

,   CASE WHEN [Column E] IS NULL THEN NULL ELSE DENSE_RANK() over (Partition by [Column A],[Column B], [Column C], [Column D] ORDER BY  [Column E]) END AS [Column E ID]

there's also a typo I'm guessing where you have [column C id] twice, I guess this should have been [column d id]. I did not correct this for you as it would not match the image so might be confusing to others

Alan Schofield
  • 19,839
  • 3
  • 22
  • 35
  • OH thanks! that will totally clean up the code and make it easier to read. I see the Typo, I'll fix the screenshot and the text so it matches – Nathan Roberts Mar 21 '23 at 22:58