4

I have an analytical function challenge on Bigquery that is messing with my mind. Sorry if I am missing any fundamental function here, but I couldn't find it.

Anyway I think this can lead to a good discussion.

I would like to get a rank among groups (dense_rank or row_number or something like that), but on a clustered fashion, which is the tricky bit.

For example, I would like to create clusters based not only on two partition columns (see image below) but based on the order among them as well. This is why I am calling it cluster. Each cluster should have the same rank if it's adjacent, but a different number if it is not (it was split by other cluster).

So, for cluster "a, x", all rows for the first cluster have number 1, then all rows for the second cluster have number 2, and so on.

How can I achieve this? Is there an analytical function out of the box for this or does this require a bit of auxiliary columns?

Thanks in advance.

example

spcvalente
  • 136
  • 3
  • 14

1 Answers1

2

Consider below approach

select partition_col_1, partition_col_2, order_col, 
  dense_rank() over(partition by partition_col_1, partition_col_2 order by group_number) output
from (
  select *, countif(group_start) over(order by order_col) group_number
  from (
    select *, 
      ifnull((partition_col_1, partition_col_2) != lag((partition_col_1, partition_col_2)) over(order by order_col), true) group_start
    from `project.dataset.table`
  )
)
# order by order_col             

if applied to sample data in your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230