3

I've got the following problem that I would like to cope with - I have a SQL dataset that I would like to section (e.g. like this one):

OldTimetag                 OldValue
2012-05-03 12:47:00        5
2012-05-03 13:00:00        1.3
2012-05-03 13:21:00        7
2012-05-03 14:56:00        5
2012-05-03 14:57:00        0.3
....                       ....

Now, I want to section (and/or average) the data based on a user-chosen interval - into new timetags, e.g. every 15 minutes with the first timetag as starting point, i.e.:

NewTimetag                 NewValue
2012-05-03 12:47:00        4.507
2012-05-03 13:02:00        1.3 
....                       ....

The main constraint is that the value next to the timetag is always valid, until the next timetag appears. So the value of 5 at timetag 2012-05-03 12:47:00 is valid for the next 13 minutes until 13:00:00. The value for the first 15 minutes from 12:47:00 would be (13*5+2*1.3)/15 = 4.507. In the next 15 minutes, at 13:02:00 the value is simply equal to 1.3... (and so on)

I've come so long, that it is a good idea to make an "artificial table" first, to later join it with the old table. I'm generating that table by:

DECLARE @intStart datetime, @intEnd datetime


SELECT @intStart =min(OldTimetag), @intEnd = MAX(OldTimetag)
FROM OldTable
    where OldTimetag between '2012-05-03 12:47:00' and '2012-05-03 14:57:00'

Declare @ArtificalTable table (NewTimeTag datetime, NewValue Float)
Declare @MinuteSlicer Int
Set @MinuteSlicer = 15
Insert @Hallo Select @intStart, null

While ( @intStart < @intEnd ) BEGIN
    Insert @ArtificalTable
    Select DATEADD(mi,@MinuteSlicer, @intStart), Null 
        Set  @intStart = DATEADD(mi,@MinuteSlicer,@intStart)
        If @intEnd <= DATEADD(mi,@MinuteSlicer,@intStart) 
            Break
End 

This gives me an output like:

NewTimetag                 NewValue
2012-05-03 12:47:00        Null
2012-05-03 13:02:00        Null
....                       ....

However, I'm having problems with the next step, how to join the tables correctly - can anyone give me a hint?

Jonas
  • 308
  • 1
  • 11

2 Answers2

2

Here is one way of doing it.

Sample Data:

declare @data table(OldTimetag datetime2, OldValue numeric(5,2));
Insert into @data(OldTimetag, OldValue) Values
    ('2012-05-03 12:47:00', 5)
    , ('2012-05-03 13:00:00', 1.3)
    , ('2012-05-03 13:21:00', 7)
    , ('2012-05-03 14:56:00', 5)
    , ('2012-05-03 14:57:00', 0.3);

Your custom range size in minutes:

declare @mins int = 15;

List is used to quickly compute an ordered list of number from 0 to n Where n <= to the number of minutes between the first and the last OldTimetag.

With list(n) as (
    Select top(Select 1+DATEDIFF(minute, min(OldTimetag), max(OldTimetag)) From @data) 
        ROW_NUMBER() over(order by (select 1))-1
    From (
        Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x3(n)
    ) as x(n)
)
Select NewTimetag = DATEADD(minute, @mins*(l.n/@mins), MIN(r.startTime)), NewValue = AVG(d.oldValue)
From list l
Cross Join (Select startTime = min(OldTimetag) From @data) as r
Cross Apply (Select maxTimetag = MAX(OldTimetag) From @data Where OldTimetag <= DATEADD(minute, n, startTime)) as mx
Inner Join @data d on d.OldTimetag = mx.maxTimetag
Group By l.n/@mins
  • Cross Join is used to mix each number from the ordered list with the first OldTimetag from your data.
  • Cross Apply is used to get the nearest OldTimetag before each minute created with the Cross Join.
  • Inner Join then matches the nearest OldTimetag with your data in order to retrieved oldValue.
  • Select only have to calculate the average for each range on @mins minutes and its NewTimetag.

It works well for a range of up to 1000 minutes between the min and max OldTimetag. If you need to go beyond this limit, you can add a a 4th line in the list CTE:

Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x4(n) => up to 10.000
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x5(n) => up to 100.000
...
Julien Vavasseur
  • 3,854
  • 1
  • 20
  • 29
  • Thanks - even more for your comments/explanations! However, I'll have to think of a modification, because the computation process gets very immense, when I hit above the 10.000 (and in my application I'll unfortunately have to go beyond the 1.000.000) – Jonas Oct 26 '15 at 09:00
  • Index on OldTimetag including OldValue may help. Which version on SQL Server are you using ? – Julien Vavasseur Oct 26 '15 at 12:56
1

One way is to determine the intervals (an interval is generated if it contains at least one timestamp), augment the time table with the next timestamp and then calculate the averages for each such interval by intersecting the intervals with the time table.

IF OBJECT_ID('tempdb..#values') IS NOT NULL DROP TABLE #values
CREATE TABLE #values (pk int identity, time datetime, value numeric(10,4))
INSERT INTO #values VALUES ('2012-05-03 12:47:00', 5)
INSERT INTO #values VALUES ('2012-05-03 13:00:00', 1.3)
INSERT INTO #values VALUES ('2012-05-03 13:21:00', 7)
INSERT INTO #values VALUES ('2012-05-03 14:56:00', 5)
INSERT INTO #values VALUES ('2012-05-03 14:57:00', 0.3)

DECLARE @timeSpanMinutes int SET @timeSpanMinutes=15
DECLARE @startTime datetime, @endTtime datetime
SELECT @startTime=MIN(time) FROM #values
SELECT @endTtime =DATEADD(MINUTE,(DATEDIFF(MINUTE,@startTime,MAX(time))
    /@timeSpanMinutes+1)*@timeSpanMinutes, @startTime) FROM #values -- MAX(time) multiple
SELECT intervals.start
  , SUM(value*(DATEDIFF(MINUTE -- minutes in intersection of [start,end] and [time,next]
     , CASE WHEN time<start THEN start ELSE time END -- Maximum(time,start)
     , CASE WHEN next<DATEADD(MINUTE,@timeSpanMinutes,intervals.start) THEN next 
            ELSE DATEADD(MINUTE,@timeSpanMinutes,intervals.start) END -- Minimum(next,end)
     )*1.0/@timeSpanMinutes)) as average
  FROM
  (SELECT DISTINCT DATEADD(MINUTE, (DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) AS start 
      FROM #values -- round start to multiple of @timeSpanMinutes
    UNION SELECT DISTINCT DATEADD(MINUTE,@timeSpanMinutes+(DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) 
      FROM #values -- union distinct with same as above but shifted with @timeSpanMinutes
  ) intervals -- intervals start time (end is calculated as start + @timeSpanMinutes)
  INNER JOIN 
  (SELECT v.*,ISNULL((SELECT MIN(time) FROM #values WHERE time>v.time),@endTtime) as next 
       FROM #values v -- add next column to #values
  ) vals
  ON vals.next>=intervals.start and vals.time<=DATEADD(MINUTE,@timeSpanMinutes,start)
  WHERE intervals.start<>@endTtime
  GROUP BY intervals.start
  ORDER BY intervals.start
Cosmin Rus
  • 330
  • 2
  • 8