0

In SQL Server, I'm trying to calculate the count of days since the same weather as today (let's assume today is 6th August 2018) was observed first in the past 5 days. Per town.

Here's the data:

+---------+---------+--------+--------+--------+
| Date    | Toronto | Cairo  | Zagreb | Ankara |
+---------+---------+--------+--------+--------+
| 1.08.18 | Rain    | Sun    | Clouds | Sun    |
| 2.08.18 | Sun     | Sun    | Clouds | Sun    |
| 3.08.18 | Rain    | Sun    | Clouds | Rain   |
| 4.08.18 | Clouds  | Sun    | Clouds | Clouds |
| 5.08.18 | Rain    | Clouds | Rain   | Rain   |
| 6.08.18 | Rain    | Sun    | Sun    | Sun    |
+---------+---------+--------+--------+--------+

This needs to perform well but all I came up with so far is single queries for each town (and there are going to be dozens of towns, not just the four). This works but is not going to scale.

Here's the one for Toronto...

SELECT 
    DATEDIFF(DAY, MIN([Date]), GETDATE()) + 1 
FROM
    (SELECT TOP 5 * 
     FROM Weather 
     WHERE [Date] <= GETDATE()
     ORDER BY [Date] DESC) a
WHERE 
    Toronto = (SELECT TOP 1 Toronto 
               FROM Weather
               WHERE DataDate = GETDATE())

...which correctly returns 4 since today there is rain and the first occurrence of rain within the past 5 days was 3rd August.

But what I want returned is a table like this:

+---------+-------+--------+--------+
| Toronto | Cairo | Zagreb | Ankara |
+---------+-------+--------+--------+
| 4       | 5     | 1      | 5      |
+---------+-------+--------+--------+

How is this possible?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Peete
  • 125
  • 11
  • 2
    I hope it's not your actual table structure. – PM 77-1 Oct 23 '18 at 22:04
  • :) Heck no! But it reflects the scenario I'm facing - simplified. – Peete Oct 23 '18 at 22:05
  • 1
    I think you'll need a more normalised table structure e.g. "Date', 'Town', 'Weather' to accomplish that. – Dale K Oct 24 '18 at 00:29
  • I looked at transforming it to be vertical (within the query ideally) but I made no progress. Any hint of how to go about that? Cheers – Peete Oct 24 '18 at 00:33
  • it would have been better to include the query that produced the first pivot output – Paul Maxwell Oct 24 '18 at 01:25
  • The first output is the table that I'm dealing with. It is a bit different in reality but it's a table (not a pivot query), simplified to show exactly the scenario I'm confronted with. Hope that helps. – Peete Oct 24 '18 at 01:30

2 Answers2

2

You really don't want to be trying to do this on pivoted data, and while you state that the data isn't stored that way, you haven't shown us how you got to the pivot of cities as columns - which is a shame.

So, I have "unpivoted" the sample in a common table expression, then used an apply operator to count the prior occurrences of the same weather in 5 previous days. As it seems you know how to pivot I leave it to you to then pivot the final result.

with cte as (
        select
              date, city, weather
        FROM (
              SELECT * from mytable
             ) AS cp
        UNPIVOT (
                  Weather FOR City IN (Toronto, Cairo, Zagreb, Ankara)
            ) AS up
    )

select 
        date, city, weather, ca.prior
from cte
cross apply (
    select count(*) as prior
    from cte as prev 
    where prev.city = cte.city
    and prev.date between dateadd(day,-6,cte.date) and dateadd(day,-1,cte.date)
    and prev.weather = cte.weather
    ) ca

Using this sample data:

CREATE TABLE mytable(
   Date    date  NOT NULL
  ,Toronto VARCHAR(9) NOT NULL
  ,Cairo   VARCHAR(9) NOT NULL
  ,Zagreb  VARCHAR(9) NOT NULL
  ,Ankara  VARCHAR(9) NOT NULL
);
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180801','Rain','Sun','Clouds','Sun');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180802','Sun','Sun','Clouds','Sun');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180803','Rain','Sun','Clouds','Rain');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180804','Clouds','Sun','Clouds','Clouds');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180805','Rain','Clouds','Rain','Rain');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180806','Rain','Sun','Sun','Sun');

The query above produced this result:

+----+---------------------+---------+---------+-------+
|    |        date         |  city   | weather | prior |
+----+---------------------+---------+---------+-------+
|  1 | 01.08.2018 00:00:00 | Ankara  | Sun     |     0 |
|  2 | 02.08.2018 00:00:00 | Ankara  | Sun     |     1 |
|  3 | 03.08.2018 00:00:00 | Ankara  | Rain    |     0 |
|  4 | 04.08.2018 00:00:00 | Ankara  | Clouds  |     0 |
|  5 | 05.08.2018 00:00:00 | Ankara  | Rain    |     1 |
|  6 | 06.08.2018 00:00:00 | Ankara  | Sun     |     2 |
|  7 | 01.08.2018 00:00:00 | Cairo   | Sun     |     0 |
|  8 | 02.08.2018 00:00:00 | Cairo   | Sun     |     1 |
|  9 | 03.08.2018 00:00:00 | Cairo   | Sun     |     2 |
| 10 | 04.08.2018 00:00:00 | Cairo   | Sun     |     3 |
| 11 | 05.08.2018 00:00:00 | Cairo   | Clouds  |     0 |
| 12 | 06.08.2018 00:00:00 | Cairo   | Sun     |     4 |
| 13 | 01.08.2018 00:00:00 | Toronto | Rain    |     0 |
| 14 | 02.08.2018 00:00:00 | Toronto | Sun     |     0 |
| 15 | 03.08.2018 00:00:00 | Toronto | Rain    |     1 |
| 16 | 04.08.2018 00:00:00 | Toronto | Clouds  |     0 |
| 17 | 05.08.2018 00:00:00 | Toronto | Rain    |     2 |
| 18 | 06.08.2018 00:00:00 | Toronto | Rain    |     3 |
| 19 | 01.08.2018 00:00:00 | Zagreb  | Clouds  |     0 |
| 20 | 02.08.2018 00:00:00 | Zagreb  | Clouds  |     1 |
| 21 | 03.08.2018 00:00:00 | Zagreb  | Clouds  |     2 |
| 22 | 04.08.2018 00:00:00 | Zagreb  | Clouds  |     3 |
| 23 | 05.08.2018 00:00:00 | Zagreb  | Rain    |     0 |
| 24 | 06.08.2018 00:00:00 | Zagreb  | Sun     |     0 |
+----+---------------------+---------+---------+-------+

For count of days since the first occurrence (within the past 5 days)

select 
        date, city, weather, datediff(day,ca.prior,cte.date) as prior
from cte
cross apply (
    select min(prev.date) as prior
    from cte as prev 
    where prev.city = cte.city
    and prev.date between dateadd(day,-6,cte.date) and dateadd(day,-1,cte.date)
    and prev.weather = cte.weather
    ) ca
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • He has a wrong desired results, for ankara it should be 3 instead of 2. – Ilyes Oct 24 '18 at 01:29
  • No @Sami. 2 is correct as only the last 5 rows are relevant. – Peete Oct 24 '18 at 01:34
  • Not sure what you mean by "you haven't shown us how you got to the pivot of cities as columns". This is a table. So it's `select * from CityWeather` – Peete Oct 24 '18 at 01:36
  • @Peete Then how Toronto become 4? huh? – Ilyes Oct 24 '18 at 01:36
  • Good point, @Sami. And sorry for the confusion. To clarify: It's about how many days have passed since the first occurrence of the same weather as today but only within the past 5 days. So ignore all records from 6+ days ago. From the remaining 5, find the first record with the same weather and return the difference in days from that day to now. So in my desired result, Ankara is indeed incorrect. It should be 5. My bad. I updated the original question. – Peete Oct 24 '18 at 01:44
  • Thanks @used_by_already. This is really close to what I'm after. I'm now working my brain trying to figure out how to show the count of days since the first occurrence (within the past 5 days) rather than the count of the occurrences as your query shows. On 3rd Aug for Toronto that would be 2 rather than 1 as the first occurrence is on 1st Aug (ie count 1st Aug and 2nd Aug). Hope that explanation makes sense. Thanks so much. – Peete Oct 24 '18 at 01:56
  • I have added the logic (for count of days since first occurence) as an option – Paul Maxwell Oct 24 '18 at 02:09
  • Thanks heaps. I'll fine tune it now to fit my scenario. – Peete Oct 24 '18 at 02:57
  • There's an update in my requirements but I think it's a new question: https://stackoverflow.com/questions/52997121/sql-count-of-rows-since-certain-value-first-occurred-keep-counting @Sami – Peete Oct 25 '18 at 19:57
1

I think you want something like

CREATE TABLE T
(
  [Date] DATE,
  Toronto VARCHAR(45),
  Cairo VARCHAR(45),
  Zagreb VARCHAR(45),
  Ankara VARCHAR(45)
);

INSERT INTO T VALUES
('2018-08-01', 'Rain', 'Sun', 'Clouds', 'Sun'),
('2018-08-02', 'Sun', 'Sun', 'Clouds', 'Sun'),
('2018-08-03', 'Rain', 'Sun', 'Clouds', 'Rain'),
('2018-08-04', 'Clouds', 'Sun', 'Clouds', 'Clouds'),
('2018-08-05', 'Rain', 'Clouds', 'Rain', 'Rain'),
('2018-08-06', 'Rain', 'Sun', 'Sun', 'Sun');

SELECT 
  (SELECT MAX(Occ) FROM (SELECT COUNT(Toronto) Occ FROM T WHERE Toronto = (select top 1 toronto from t order by date desc) GROUP BY Toronto) T) Toronto,
  (SELECT MAX(Occ) FROM (SELECT COUNT(Cairo) Occ FROM T WHERE Cairo = (select top 1 Cairo from t order by date desc) GROUP BY Cairo) T) Cairo,
  (SELECT MAX(Occ) FROM (SELECT COUNT(Zagreb) Occ FROM T WHERE Zagreb = (select top 1 Zagreb from t order by date desc)GROUP BY Zagreb) T) Zagreb,
  (SELECT MAX(Occ) FROM (SELECT COUNT(Ankara) Occ FROM T WHERE Ankara = (select top 1 Ankara from t order by date desc)GROUP BY Ankara) T) Ankara

Returns

+----+---------+-------+--------+--------+
|    | Toronto | Cairo | Zagreb | Ankara |
+----+---------+-------+--------+--------+
|  1 |       4 |     5 |      1 |      3 |
+----+---------+-------+--------+--------+

Demo

Ilyes
  • 14,640
  • 4
  • 29
  • 55
  • Thanks. What I'm trying to avoid is specifying each town separately in its own sub-query. There are dozens and the list grows. Also this returns the count of the weather type that occurs most, not the ones that match today's weather (e.g. 'Clouds' for Zagreb when it should count 'Sun'). It still may help me finding the full solution myself. Cheers – Peete Oct 24 '18 at 01:19
  • @Peete Check updates, but for ankara the last weather is sun so it should be 3 instead of 2. – Ilyes Oct 24 '18 at 01:21
  • Getting closer, great. However Ankara is 2 as only the last 5 rows are relevant. But that's a small change. – Peete Oct 24 '18 at 01:33