-1

I want to partition by three columns in my query :

  1. user id
  2. cancelation month year.
  3. retention month year.

I used row number and partition by as follows

row_number() over (partition by user_id ,cast ( date_format(cancelation_date,'yyyyMM') as integer),cast ( date_format(retention_date,'yyyyMM') as integer) order by cast ( date_format(cancelation_date,'yyyyMM') as integer) asc, cast ( date_format(retention_date,'yyyyMM') as integer)   asc) as row_count

example of the output I got :

| user_id |cancelation_date |cancelation_month_year|retention_date|retention_month_year|row_count|
| -------- | -------------- |----------------------|--------------|--------------------|---------|
| 566      |      28-5-2020 |  202005              | 20-7-2020    | 202007             |1        |
| 566      |     28-5-2020  |  202005              | 30-7-2-2020  | 202007             |2        |

example of the output I want to get:

user_id cancelation_date cancelation_month_year retention_date retention_month_year row_count
566 28-5-2020 202005 20-7-2020 202007 1
566 28-5-2020 202005 30-7-2-2020 202007 1

note that user may have more than cancelation months, for example f he has canceled in August , I want row count =2 for all dates in August and so on.

it's not obvious why partition by is partitioning by retention date instead of partitioning by retention month year.

layal
  • 27
  • 6
  • I get the impression `row_number` is not what you want, rather you are interested in [`dense_rank`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.dense_rank.html?highlight=dense_rank), wherein you would get your expected output. – Nithish Nov 10 '21 at 10:52
  • that's right . dense_rank worked for me – layal Nov 10 '21 at 11:25
  • 1
    When you have a moment, if snithish answer worked for you, I'm sure they'd appreciate you marking it as correct. – Matt Andruff Nov 10 '21 at 16:40

1 Answers1

0

I get the impression that row_number is not what you want, rather you are interested in dense_rank, wherein you would get your expected output.

Nithish
  • 3,062
  • 2
  • 8
  • 16