7

Using Postgres 9.3, I'm trying to count the number of contiguous days of a certain weather type. If we assume we have a regular time series and weather report:

date|weather
"2016-02-01";"Sunny"
"2016-02-02";"Cloudy"
"2016-02-03";"Snow"
"2016-02-04";"Snow"
"2016-02-05";"Cloudy"
"2016-02-06";"Sunny"
"2016-02-07";"Sunny"
"2016-02-08";"Sunny"
"2016-02-09";"Snow"
"2016-02-10";"Snow"

I want something count the contiguous days of the same weather. The results should look something like this:

date|weather|contiguous_days 
"2016-02-01";"Sunny";1
"2016-02-02";"Cloudy";1
"2016-02-03";"Snow";1
"2016-02-04";"Snow";2
"2016-02-05";"Cloudy";1
"2016-02-06";"Sunny";1
"2016-02-07";"Sunny";2
"2016-02-08";"Sunny";3
"2016-02-09";"Snow";1
"2016-02-10";"Snow";2

I've been banging my head on this for a while trying to use windowing functions. At first, it seems like it should be no-brainer, but then I found out its much harder than expected.

Here is what I've tried...

Select date, weather, Row_Number() Over (partition by weather order by date)
  from t_weather

Would it be better just easier to compare the current row to the next? How would you do that while maintaining a count? Any thoughts, ideas, or even solutions would be helpful! -Kip

Kip
  • 97
  • 6

4 Answers4

4

You need to identify the contiguous where the weather is the same. You can do this by adding a grouping identifier. There is a simple method: subtract a sequence of increasing numbers from the dates and it is constant for contiguous dates.

One you have the grouping, the rest is row_number():

Select date, weather,
       Row_Number() Over (partition by weather, grp order by date)
from (select w.*, 
             (date - row_number() over (partition by weather order by date) * interval '1 day') as grp
      from t_weather w
     ) w;

The SQL Fiddle is here.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This doesn't work: [SQL Fiddle](http://www.sqlfiddle.com/#!15/a0bcd/1). (I didn't downvote.) – Travis Feb 18 '16 at 22:03
  • 1
    I always forget the weirdnesses about Postgres date arithmetic. The edit works. I imagine that the downvote itself is simply malicious; the mistake was more a typo in the code than a logic error, and the logic should be correct. – Gordon Linoff Feb 18 '16 at 23:09
  • Tried the query but it's incorrect: the `row_number()` counts the total occurrence of a particular `weather`, without reset. For example, "Feb 5, Cloudy" should have row_number() 1 because "Feb 4" is "Snow". – Kenney Feb 19 '16 at 12:16
  • @Kenney . . . Ooops, the subquery should also have the `partition by weather`. – Gordon Linoff Feb 20 '16 at 00:06
  • 1
    Ah, that's a simple fix! Probably the most efficient answer with only two selects. – Kenney Feb 20 '16 at 11:02
  • Ah, thanks. You ran into the same issue that I had. I was missing the sub query partition by weather as well. – Kip Feb 22 '16 at 14:29
2

I'm not sure what the query engine is going to do when scanning multiple times across the same data set (kinda like calculating area under a curve), but this works...

WITH v(date, weather) AS (
VALUES 
('2016-02-01'::date,'Sunny'::text),
('2016-02-02','Cloudy'),
('2016-02-03','Snow'),
('2016-02-04','Snow'),
('2016-02-05','Cloudy'),
('2016-02-06','Sunny'),
('2016-02-07','Sunny'),
('2016-02-08','Sunny'),
('2016-02-09','Snow'),
('2016-02-10','Snow') ),
changes AS (
SELECT date, 
    weather, 
    CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change
FROM v)
SELECT date
    , weather
    ,(SELECT count(weather) -- number of times the weather didn't change
      FROM changes v2 
      WHERE v2.date <= v1.date AND v2.weather = v1.weather
        AND v2.date >= ( -- bounded between changes of weather
            SELECT max(date) 
            FROM changes v3 
            WHERE change = 0 
            AND v3.weather = v1.weather 
            AND v3.date <= v1.date)  --<-- here's the expensive part
    ) curve
FROM changes v1
Kirk Roybal
  • 17,273
  • 1
  • 29
  • 38
2

Here is another approach based off of this answer.

First we add a change column that is 1 or 0 depending on whether the weather is different or not from the previous day.
Then we introduce a group_nr column by summing the change over an order by date. This produces a unique group number for each sequence of consecutive same-weather days since the sum is only incremented on the first day of each sequence.
Finally we do a row_number() over (partition by group_nr order by date) to produce the running count per group.

select date, weather, row_number() over (partition by group_nr order by date)
from (
  select *, sum(change) over (order by date) as group_nr
  from (
    select *, (weather != lag(weather,1,'') over (order by date))::int as change
    from tmp_weather
  ) t1
) t2;

sqlfiddle (uses equivalent WITH syntax)

Community
  • 1
  • 1
Kenney
  • 9,003
  • 15
  • 21
1

You can accomplish this with a recursive CTE as follows:

WITH RECURSIVE CTE_ConsecutiveDays AS
(
    SELECT
        my_date,
        weather,
        1 AS consecutive_days
    FROM My_Table T
    WHERE
        NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather)
    UNION ALL
    SELECT
        T.my_date,
        T.weather,
        CD.consecutive_days + 1
    FROM
        CTE_ConsecutiveDays CD
    INNER JOIN My_Table T ON
        T.my_date = CD.my_date + INTERVAL '1 day' AND
        T.weather = CD.weather
)
SELECT *
FROM CTE_ConsecutiveDays
ORDER BY my_date;

Here's the SQL Fiddle to test: http://www.sqlfiddle.com/#!15/383e5/3

Tom H
  • 46,766
  • 14
  • 87
  • 128