1

I want to use rank() window function on a Redshift database to rank over specific, multiple columns. The code shall check those multiple columns per each row and assign same rank to rows that have identical values in ALL those columns.

Example image found in link below:

https://ibb.co/GJv1xQL

There are 18 distinct rows, however the rank shows 3 distinct rows, according to the ranking I wish to apply.

I tried :

select tbl.* , dense_rank() over (partition by secondary_id order by created_on, type1, type2, money, amount nulls last ) as rank from table tbl where secondary_id='92d30f87-b2da-45c0-bdf7-c5ca96fe5ea6'

But the ranks assigned were wrong, and then I tried:

select tbl.* , dense_rank() over (partition by secondary_id,created_on, type1, type2, money, amount ) as rank from table tbl where secondary_id='92d30f87-b2da-45c0-bdf7-c5ca96fe5ea6'

But this assigned rank=1 everywhere, in every row.

Sging
  • 27
  • 5
  • can you reformat your query and post the expected results. The issue could be missing / addition of a column in rank function . The rank is assigned 1 in every row since you are using only secondary_id as the aggregation layer which has the same value for all the rows in that column. What ever you will use in the partition by will be set as granularity level. if you run the code by over (partition by secondary_id, type1.....) it will show you rank values 1 and 2 only since now your granularity level looks for secondary_id and the distinct type 1 value – trillion Jun 08 '21 at 10:49
  • Hi @trillion thanks for answering. I had a typo in my description, actually I already tried what u suggest , to put in the partition by clause all the columns that I want. See below: select tbl.* , dense_rank() over (partition by secondary_id,created_on, type1, type2, money, amount ) as rank from table tbl where secondary_id='92d30f87-b2da-45c0-bdf7-c5ca96fe5ea6' This assigns rank=1 in every row. Does not seem to work. – Sging Jun 08 '21 at 14:28

1 Answers1

1

I found how to solve this. The reason that the order by all the columns of interest was failing, is because the timestamp column contained different values in miliseconds, which was not obvious by viewing the data . So I only took into account the timestamp up until seconds and it worked! So I converted created_on column to date_trunc('s',cd.created_on) .

select tbl.* , dense_rank() over (partition by secondary_id order by date_trunc('s',created_on), type1, type2, money, amount nulls last ) as rank from table tbl where secondary_id='92d30f87-b2da-45c0-bdf7-c5ca96fe5ea6'

Sging
  • 27
  • 5