I want to partition by three columns in my query :
- user id
- cancelation month year.
- retention month year.
I used row number and partition by as follows
row_number() over (partition by user_id ,cast ( date_format(cancelation_date,'yyyyMM') as integer),cast ( date_format(retention_date,'yyyyMM') as integer) order by cast ( date_format(cancelation_date,'yyyyMM') as integer) asc, cast ( date_format(retention_date,'yyyyMM') as integer) asc) as row_count
example of the output I got :
| user_id |cancelation_date |cancelation_month_year|retention_date|retention_month_year|row_count|
| -------- | -------------- |----------------------|--------------|--------------------|---------|
| 566 | 28-5-2020 | 202005 | 20-7-2020 | 202007 |1 |
| 566 | 28-5-2020 | 202005 | 30-7-2-2020 | 202007 |2 |
example of the output I want to get:
user_id | cancelation_date | cancelation_month_year | retention_date | retention_month_year | row_count |
---|---|---|---|---|---|
566 | 28-5-2020 | 202005 | 20-7-2020 | 202007 | 1 |
566 | 28-5-2020 | 202005 | 30-7-2-2020 | 202007 | 1 |
note that user may have more than cancelation months, for example f he has canceled in August , I want row count =2 for all dates in August and so on.
it's not obvious why partition by is partitioning by retention date instead of partitioning by retention month year.