-1

Looking for ways to specify the start time of a PARTITION BY statement in SQL Server.

Partitioning a years worth of data into 60 minute segments. The data is 10 minute collections from an IOT device. Would like the partitions to start at 6AM each day.

How do I accomplish that fixed start time every day?

Here's some sample data. Want the windowing (partition) to start on the hour:

Sample data, 10 minute data sampling:

code datetimePDT data
AA01 12/15/2021 05:44 AM 0100
AA02 12/15/2021 05:54 AM 0105
AA03 12/15/2021 06:04 AM 0103
AA04 12/15/2021 06:14 AM 0109
AA05 12/15/2021 06:24 AM 0112
AA06 12/15/2021 06:34 AM 0115
AA07 12/15/2021 06:44 AM 0119
AA08 12/15/2021 06:54 AM 0125
AA09 12/15/2021 07:04 AM 0135
AA10 12/15/2021 07:14 AM 0155
AA11 12/15/2021 07:24 AM 0195

In a stored procedure - Ranking by minute: dense_rank() over (order by datepart(day,datetimePDT), datepart(hour,datetimePDT), datepart(minute,datetimePDT)) minuteRank

Grouping minutes into hours: CEILING((minuteRank-1)/10) hourGroup

Then doing things like pulling out the average: avg(data) over (partition by hourGroup) as GroupAVG

Prefer the hourGroup to start at 6 AM, so my GroupAVG is over the rows from 6:04 to 6:54, and the next partition is from 7-8 AM.

To make this more complicated, there may be missing data, so I can't rely on the data collection period being 10 minutes.

Want to get here:

code datetimePDT data minuteRank hourGroup
AA01 12/15/2021 05:44 AM 0100 01 NULL
AA02 12/15/2021 05:54 AM 0105 02 NULL
AA03 12/15/2021 06:04 AM 0103 03 0001
AA04 12/15/2021 06:14 AM 0109 04 0001
AA05 12/15/2021 06:24 AM 0112 05 0001
AA06 12/15/2021 06:34 AM 0115 06 0001
AA07 12/15/2021 06:44 AM 0119 07 0001
AA08 12/15/2021 06:54 AM 0125 08 0001
AA09 12/15/2021 07:04 AM 0135 09 0002
AA10 12/15/2021 07:14 AM 0155 10 0002
AA11 12/15/2021 07:24 AM 0195 11 0002
fip
  • 43
  • 5
  • 3
    So what about the data before `09:00`? You aren't very clearly here. Some sample data, expected results, *and* **your attempts** will help us help you. – Thom A Feb 14 '22 at 18:31
  • Ignoring the 9AM problem mentioned, you would need to partition by DATE and hour, correct? DATEPART will get the hour from a datetime (or similar) datatype. – SMor Feb 14 '22 at 18:49

2 Answers2

0

Maybe something like this?

Declare @testTable table (MyTestDate datetime);
 Insert Into @testTable (MyTestDate)
 Values ('2022-02-12 04:06:57.683')
      , ('2022-02-12 05:06:57.683')
      , ('2022-02-12 06:06:57.683')
      , ('2022-02-12 07:06:57.683')
      , ('2022-02-12 08:06:57.683')
      , ('2022-02-12 09:06:57.683')
      , ('2022-02-12 10:06:57.683')
      , ('2022-02-12 11:06:57.683')
      , ('2022-02-12 12:06:57.683')
      , ('2022-02-12 13:06:57.683')
      , ('2022-02-12 14:06:57.683')
      , ('2022-02-12 15:06:57.683')
      , ('2022-02-12 16:06:57.683')
      , ('2022-02-12 17:06:57.683')
      , ('2022-02-12 18:06:57.683')
      , ('2022-02-12 19:06:57.683')
      , ('2022-02-12 20:06:57.683')
      , ('2022-02-12 12:06:57.683')
      , ('2022-02-13 04:06:57.683')
      , ('2022-02-13 05:06:57.683')
      , ('2022-02-13 06:06:57.683')
      , ('2022-02-13 07:06:57.683')
      , ('2022-02-13 08:06:57.683')
      , ('2022-02-13 09:06:57.683')
      , ('2022-02-13 10:06:57.683')
      , ('2022-02-13 11:06:57.683')
      , ('2022-02-13 12:06:57.683')
      , ('2022-02-13 13:06:57.683')
      , ('2022-02-13 14:06:57.683')
      , ('2022-02-13 15:06:57.683')
      , ('2022-02-13 16:06:57.683')
      , ('2022-02-13 17:06:57.683')
      , ('2022-02-13 18:06:57.683')
      , ('2022-02-13 19:06:57.683')
      , ('2022-02-13 20:06:57.683')
      , ('2022-02-13 12:06:57.683');

Select *
     , row_number() Over(Partition By t.start_date Order By tt.MyTestDate)
  From @testTable tt
 Cross Apply (Values(dateadd(day, datediff(day, '09:00', tt.MyTestDate) - iif(datepart(hour, tt.MyTestDate) < 9, 1, 0), '09:00'))) As t(start_date);
Jeff
  • 512
  • 2
  • 8
0

What is the reasoning for PARTITION BY instead of just GROUP BY on the HOUR? You could play around with it obviously and put the aggregation into a CTE if you still want to see the individual values.

EDIT: Added a CTE and CASE expression in final select to partition as noted by Jeff.

DECLARE @Table TABLE (code VARCHAR(10), datetimePDT DATETIME, [data] INT)
INSERT INTO @Table VALUES
('AA01','12/15/2021 05:44 AM', 0100),
('AA02','12/15/2021 05:54 AM', 0105),
('AA03','12/15/2021 06:04 AM', 0103),
('AA04','12/15/2021 06:14 AM', 0109),
('AA05','12/15/2021 06:24 AM', 0112),
('AA06','12/15/2021 06:34 AM', 0115),
('AA07','12/15/2021 06:44 AM', 0119),
('AA08','12/15/2021 06:54 AM', 0125),
('AA09','12/15/2021 07:04 AM', 0135),
('AA10','12/15/2021 07:14 AM', 0155),
('AA11','12/15/2021 07:24 AM', 0195);

WITH DataAgg
AS
(
SELECT MIN(tt.code) AS FirstOfGroup,
    MAX(tt.code) AS LastOfGroup,
    COUNT(tt.code) AS NumberInGroup,
    DATEPART(YEAR,tt.datetimePDT) AS [DataYear],
    DATEPART(MONTH,tt.datetimePDT) AS [DataMonth],
    DATEPART(DAY,tt.datetimePDT) AS [DataDay],
    DATEPART(HOUR,tt.datetimePDT) AS [DataHour],
    AVG(tt.[data]) AS AvgData
FROM @Table tt
GROUP BY DATEPART(YEAR,tt.datetimePDT),DATEPART(MONTH,tt.datetimePDT),DATEPART(DAY,tt.datetimePDT), DATEPART(HOUR,tt.datetimePDT)
)

SELECT t.code,
    t.datetimePDT,
    t.data,
    d.AvgData AS 'HourAvg', 
    CASE WHEN DATEPART(HOUR,t.datetimePDT) >= 6 THEN CAST(t.datetimePDT AS DATE)
         WHEN DATEPART(HOUR,t.datetimePDT) BETWEEN 1 AND 5 THEN CAST(DATEADD(DAY,-1,t.datetimePDT) AS DATE) END AS 'DataDate'
FROM @Table t
    LEFT JOIN DataAgg d ON t.code BETWEEN d.FirstOfGroup AND d.LastOfGroup
FirstOfGroup LastOfGroup NumberInGroup DataYear DataMonth DataDay DataHour AvgData
AA01 AA02 2 2021 12 15 5 102
AA03 AA08 6 2021 12 15 6 113
AA09 AA11 3 2021 12 15 7 161
code datetimePDT data HourAvg DataDate
AA01 2021-12-15 05:44:00.000 100 102 2021-12-14
AA02 2021-12-15 05:54:00.000 105 102 2021-12-14
AA03 2021-12-15 06:04:00.000 103 113 2021-12-15
AA04 2021-12-15 06:14:00.000 109 113 2021-12-15
AA05 2021-12-15 06:24:00.000 112 113 2021-12-15
AA06 2021-12-15 06:34:00.000 115 113 2021-12-15
AA07 2021-12-15 06:44:00.000 119 113 2021-12-15
AA08 2021-12-15 06:54:00.000 125 113 2021-12-15
AA09 2021-12-15 07:04:00.000 135 161 2021-12-15
AA10 2021-12-15 07:14:00.000 155 161 2021-12-15
AA11 2021-12-15 07:24:00.000 195 161 2021-12-15
dogyog
  • 300
  • 2
  • 8
  • Good idea. Thanks for pointing out the forest through the trees! – fip Feb 15 '22 at 21:45
  • If the 'day' is defined as 9am to 9am - then grouping by date/hour will not be enough. You still need to identify the partition at the date level for each group, then you can calculate the first, second,... hour groups per date group and minute groups (looks like every 10 minutes). If the cutoff time is 9am - then anything in the 8am time would be hour group 24 for the previous day and not group 8 for the current day. – Jeff Feb 16 '22 at 17:13