1

I am having trouble narrowing down sales in top regions that occurred in consecutive months. I know I need to use some form of window function with Row_Number or Dense_Rank, but I am having trouble getting the final output

Here is my source data:

+--------+-----------+------------+
| Fruit  | SaleDate  | Top_Region |
+--------+-----------+------------+
| Apple  | 1/1/2017  |          1 |
| Apple  | 2/1/2017  |          1 |
| Apple  | 3/1/2017  |          1 |
| Apple  | 4/1/2017  |          0 |
| Apple  | 5/1/2017  |          0 |
| Apple  | 6/1/2017  |          0 |
| Apple  | 7/1/2017  |          1 |
| Apple  | 8/1/2017  |          1 |
| Apple  | 9/1/2017  |          1 |
| Apple  | 10/1/2017 |          1 |
| Apple  | 11/1/2017 |          0 |
| Apple  | 12/1/2017 |          0 |
| Banana | 1/1/2017  |          0 |
| Banana | 2/1/2017  |          0 |
| Banana | 3/1/2017  |          1 |
| Banana | 4/1/2017  |          1 |
| Banana | 5/1/2017  |          1 |
| Banana | 6/1/2017  |          1 |
| Banana | 7/1/2017  |          1 |
| Banana | 8/1/2017  |          1 |
| Banana | 9/1/2017  |          0 |
| Banana | 10/1/2017 |          1 |
| Banana | 11/1/2017 |          1 |
| Banana | 12/1/2017 |          0 |
+--------+-----------+------------+

This is the expected output:

+--------+-----------+-----------+-------+
| Fruit  |   Start   |    End    | Total |
+--------+-----------+-----------+-------+
| Apple  | 1/1/2017  | 3/1/2017  |     3 |
| Apple  | 7/1/2017  | 10/1/2017 |     4 |
| Banana | 3/1/2017  | 8/1/2017  |     6 |
| Banana | 10/1/2017 | 11/1/2017 |     2 |
+--------+-----------+-----------+-------+

The goal is to have instances of top region sales in succession with missing in one month.

So far I have tried a few different combinations, with this being the closest.

SELECT fruit,
        MIN(saledate) AS spanStart ,
        MAX(saledate) AS spanEnd,
        COUNT(*) AS spanLength
FROM    ( SELECT    s.* ,
                    ( ROW_NUMBER() OVER ( ORDER BY month )
                      - ROW_NUMBER() OVER ( PARTITION BY fruit, topregion ORDER BY month ) ) AS fruits
          FROM      #salesdata s
        ) s
GROUP BY fruit,fruits ,
        topregion
HAVING  topregion = 1
ORDER BY COUNT(*) DESC;

Any help would be greatly appreciated

GMB
  • 216,147
  • 25
  • 84
  • 135
UserX
  • 157
  • 13

1 Answers1

0

This is a typical gaps-and-island problem. One strategy is to identify the groups of adjacent rows groups by computing the difference between two row_number()s. We can then filter on groups having top_region = 1 and use aggregation to get the start date, end date and number of records per group.

Your query is really close, but the first row_number() is missing a partition by fruit in its over() clause. And I find that aliasing that column fruits where another column is called fruit is error prone.

select 
    fruit,
    min(sale_date) start_date,
    max(sale_date) end_date,
    count(*) total
from (
    select 
        t.*,
        row_number() over(partition by fruit order by sale_date) rn1,
        row_number() over(partition by fruit, top_region order by sale_date) rn2
    from mytable t
) t
where top_region = 1
group by fruit, rn1 - rn2
order by fruit, start_date

You can run the inner query separately to see the result it produces.

Demo on DB Fiddle:

fruit  | start_date | end_date   | total
:----- | :--------- | :--------- | ----:
Apple  | 2017-01-01 | 2017-01-03 |     3
Apple  | 2017-01-07 | 2017-01-10 |     4
Banana | 2017-01-03 | 2017-01-08 |     6
Banana | 2017-01-10 | 2017-01-11 |     2
GMB
  • 216,147
  • 25
  • 84
  • 135