SQL sectioning/averaging based on different timetag/timestamps and user-chosen input (T-SQL)

Question

I've got the following problem that I would like to cope with - I have a SQL dataset that I would like to section (e.g. like this one):

OldTimetag                 OldValue
2012-05-03 12:47:00        5
2012-05-03 13:00:00        1.3
2012-05-03 13:21:00        7
2012-05-03 14:56:00        5
2012-05-03 14:57:00        0.3
....                       ....

Now, I want to section (and/or average) the data based on a user-chosen interval - into new timetags, e.g. every 15 minutes with the first timetag as starting point, i.e.:

NewTimetag                 NewValue
2012-05-03 12:47:00        4.507
2012-05-03 13:02:00        1.3 
....                       ....

The main constraint is that the value next to the timetag is always valid, until the next timetag appears. So the value of 5 at timetag 2012-05-03 12:47:00 is valid for the next 13 minutes until 13:00:00. The value for the first 15 minutes from 12:47:00 would be (13*5+2*1.3)/15 = 4.507. In the next 15 minutes, at 13:02:00 the value is simply equal to 1.3... (and so on)

I've come so long, that it is a good idea to make an "artificial table" first, to later join it with the old table. I'm generating that table by:

DECLARE @intStart datetime, @intEnd datetime


SELECT @intStart =min(OldTimetag), @intEnd = MAX(OldTimetag)
FROM OldTable
    where OldTimetag between '2012-05-03 12:47:00' and '2012-05-03 14:57:00'

Declare @ArtificalTable table (NewTimeTag datetime, NewValue Float)
Declare @MinuteSlicer Int
Set @MinuteSlicer = 15
Insert @Hallo Select @intStart, null

While ( @intStart < @intEnd ) BEGIN
    Insert @ArtificalTable
    Select DATEADD(mi,@MinuteSlicer, @intStart), Null 
        Set  @intStart = DATEADD(mi,@MinuteSlicer,@intStart)
        If @intEnd <= DATEADD(mi,@MinuteSlicer,@intStart) 
            Break
End

This gives me an output like:

NewTimetag                 NewValue
2012-05-03 12:47:00        Null
2012-05-03 13:02:00        Null
....                       ....

However, I'm having problems with the next step, how to join the tables correctly - can anyone give me a hint?

Hint: JOIN to an artificial table with a row for every minute in the range you are querying. — Tab Alleman, Oct 23 '15 at 12:38
What is the input from the user? a list of newTime tags or just the first and last time tags and a number of minute such as 15? — Julien Vavasseur, Oct 23 '15 at 12:51
@JulienVavasseur the user input is only the number of minutes, such as e.g. 15. The first value, should be the first OldTimetag and the last value should be based on the minutes available for the last OldTimetag value — Jonas, Oct 23 '15 at 12:53
and it should slice the date between the 1st and last date by 15min range. — Julien Vavasseur, Oct 23 '15 at 12:56

Julien Vavasseur · Answer 1 · 2015-10-23T14:23:53.033

Here is one way of doing it.

Sample Data:

declare @data table(OldTimetag datetime2, OldValue numeric(5,2));
Insert into @data(OldTimetag, OldValue) Values
    ('2012-05-03 12:47:00', 5)
    , ('2012-05-03 13:00:00', 1.3)
    , ('2012-05-03 13:21:00', 7)
    , ('2012-05-03 14:56:00', 5)
    , ('2012-05-03 14:57:00', 0.3);

Your custom range size in minutes:

declare @mins int = 15;

List is used to quickly compute an ordered list of number from 0 to n Where n <= to the number of minutes between the first and the last OldTimetag.

With list(n) as (
    Select top(Select 1+DATEDIFF(minute, min(OldTimetag), max(OldTimetag)) From @data) 
        ROW_NUMBER() over(order by (select 1))-1
    From (
        Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x3(n)
    ) as x(n)
)
Select NewTimetag = DATEADD(minute, @mins*(l.n/@mins), MIN(r.startTime)), NewValue = AVG(d.oldValue)
From list l
Cross Join (Select startTime = min(OldTimetag) From @data) as r
Cross Apply (Select maxTimetag = MAX(OldTimetag) From @data Where OldTimetag <= DATEADD(minute, n, startTime)) as mx
Inner Join @data d on d.OldTimetag = mx.maxTimetag
Group By l.n/@mins

Cross Join is used to mix each number from the ordered list with the first OldTimetag from your data.
Cross Apply is used to get the nearest OldTimetag before each minute created with the Cross Join.
Inner Join then matches the nearest OldTimetag with your data in order to retrieved oldValue.
Select only have to calculate the average for each range on @mins minutes and its NewTimetag.

It works well for a range of up to 1000 minutes between the min and max OldTimetag. If you need to go beyond this limit, you can add a a 4th line in the list CTE:

Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x4(n) => up to 10.000
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x5(n) => up to 100.000
...

Thanks - even more for your comments/explanations! However, I'll have to think of a modification, because the computation process gets very immense, when I hit above the 10.000 (and in my application I'll unfortunately have to go beyond the 1.000.000) — Jonas, Oct 26 '15 at 09:00
Index on OldTimetag including OldValue may help. Which version on SQL Server are you using ? — Julien Vavasseur, Oct 26 '15 at 12:56

Cosmin Rus · Accepted Answer · 2015-10-26T10:00:17.383

One way is to determine the intervals (an interval is generated if it contains at least one timestamp), augment the time table with the next timestamp and then calculate the averages for each such interval by intersecting the intervals with the time table.

IF OBJECT_ID('tempdb..#values') IS NOT NULL DROP TABLE #values
CREATE TABLE #values (pk int identity, time datetime, value numeric(10,4))
INSERT INTO #values VALUES ('2012-05-03 12:47:00', 5)
INSERT INTO #values VALUES ('2012-05-03 13:00:00', 1.3)
INSERT INTO #values VALUES ('2012-05-03 13:21:00', 7)
INSERT INTO #values VALUES ('2012-05-03 14:56:00', 5)
INSERT INTO #values VALUES ('2012-05-03 14:57:00', 0.3)

DECLARE @timeSpanMinutes int SET @timeSpanMinutes=15
DECLARE @startTime datetime, @endTtime datetime
SELECT @startTime=MIN(time) FROM #values
SELECT @endTtime =DATEADD(MINUTE,(DATEDIFF(MINUTE,@startTime,MAX(time))
    /@timeSpanMinutes+1)*@timeSpanMinutes, @startTime) FROM #values -- MAX(time) multiple
SELECT intervals.start
  , SUM(value*(DATEDIFF(MINUTE -- minutes in intersection of [start,end] and [time,next]
     , CASE WHEN time<start THEN start ELSE time END -- Maximum(time,start)
     , CASE WHEN next<DATEADD(MINUTE,@timeSpanMinutes,intervals.start) THEN next 
            ELSE DATEADD(MINUTE,@timeSpanMinutes,intervals.start) END -- Minimum(next,end)
     )*1.0/@timeSpanMinutes)) as average
  FROM
  (SELECT DISTINCT DATEADD(MINUTE, (DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) AS start 
      FROM #values -- round start to multiple of @timeSpanMinutes
    UNION SELECT DISTINCT DATEADD(MINUTE,@timeSpanMinutes+(DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) 
      FROM #values -- union distinct with same as above but shifted with @timeSpanMinutes
  ) intervals -- intervals start time (end is calculated as start + @timeSpanMinutes)
  INNER JOIN 
  (SELECT v.*,ISNULL((SELECT MIN(time) FROM #values WHERE time>v.time),@endTtime) as next 
       FROM #values v -- add next column to #values
  ) vals
  ON vals.next>=intervals.start and vals.time<=DATEADD(MINUTE,@timeSpanMinutes,start)
  WHERE intervals.start<>@endTtime
  GROUP BY intervals.start
  ORDER BY intervals.start

Thanks for your answer, it works very smooth until a certain point - wheres the data in the results for e.g 13:32, 13:47, 14:02, 14:17 and 14:32? — Jonas, Oct 26 '15 at 09:25
Some intervals are not generated. The average should be the same value as for the previous interval. — Cosmin Rus, Oct 26 '15 at 09:39
The last interval is calculated as (7*9+5*1)/15. It should have included the last value for the timestamp too 0.3*5/15. I will fix the code. — Cosmin Rus, Oct 26 '15 at 09:56

SQL sectioning/averaging based on different timetag/timestamps and user-chosen input (T-SQL)

2 Answers2