1

I have some devices in the field, sending data by GSM by losing connection from time to time. As I have limited disk space, I tend to loose some data in the periods without connection so I like to evaluate the amount of pending data to get some insight of the situation.

If I reduces the table from my server to the columns I need, it looks like this:

Declare @table  as table( 
timestamp datetime, 
lastchanged datetime )
INSERT INTO @table
VALUES 
('2019-04-01 12:27:23', '2019-04-01 12:26:17'),
('2019-04-01 12:27:23', '2019-04-01 12:25:47'),
('2019-04-01 12:27:23', '2019-04-01 12:26:17'),
('2019-04-01 12:27:23', '2019-04-01 12:26:03'),
('2019-04-01 12:27:23', '2019-04-01 12:26:20'),
('2019-04-01 12:28:23', '2019-04-01 12:25:52'),
('2019-04-01 12:28:23', '2019-04-01 12:26:22'),
('2019-04-01 12:28:23', '2019-04-01 12:26:18'),
('2019-04-01 12:28:23', '2019-04-01 12:25:54'),
('2019-04-01 12:29:23', '2019-04-01 12:25:47'),
('2019-04-01 12:29:23', '2019-04-01 12:26:17'),
('2019-04-01 12:29:23', '2019-04-01 12:25:47'),
('2019-04-01 12:29:23', '2019-04-01 12:25:45'),
('2019-04-01 12:30:23', '2019-04-01 12:26:17'),
('2019-04-01 12:30:23', '2019-04-01 12:25:47'),
('2019-04-01 12:30:23', '2019-04-01 12:26:17'),
('2019-04-01 12:31:23', '2019-04-01 12:26:03'),
('2019-04-01 12:31:23', '2019-04-01 12:26:20'),
('2019-04-01 12:31:23', '2019-04-01 12:25:52'),
('2019-04-01 12:31:23', '2019-04-01 12:26:22'),
('2019-04-01 12:31:23', '2019-04-01 12:26:18'),
('2019-04-01 12:31:23', '2019-04-01 12:25:54'),
('2019-04-01 12:32:23', '2019-04-01 12:25:47'),
('2019-04-01 12:32:23', '2019-04-01 12:26:17'),
('2019-04-01 12:32:23', '2019-04-01 12:25:47'),
('2019-04-01 12:32:23', '2019-04-01 12:25:45');

as the value of timestamp (first column) is createt at the server and lastchanged (second column) is a timestamp from the device, the differenc obviously is the delay by the sending process.

From every sensor from my device I get a separate entry, so I just need to count rows to get an idea of how many datapoints were stored on the device in a specific range of time (this varies, depending on the behavior of the device).

Now I like to group by timestamp to evaluate for each of this timestamps, how many datasets have been already created, but did not reach the server so fare.

So, for every timestamp I would like to know the amount of rows in the dataset, which have a timestamp in the future but a lastchanged in the past.

As my dataset is very big, I would like to avoid any fetching.

And yes, this should not be a in-time analysis, I just need it to understand the behavior of the device and adjust some parameters.

I already tryed this:

SELECT
    A.timestamp,
    COUNT(case when (A.timestamp < B.timestamp AND A.timestamp > B.lastchanged  ) then 1 else null end) AS CountPending
FROM @table A, @table B
GROUP BY A.timestamp
ORDER BY A.timestamp

and think that CTEs and subquerys will not help as I need to do the filtering and counting in one step. And as I understand, classical running windows in SQL just need a fixed amount of entry for the window-range, but in my dataset, this number is not constant at all.

Applied on the example from above, the result is just wrong:

2019-04-01 12:27:23.000 105
2019-04-01 12:28:23.000 68
2019-04-01 12:29:23.000 52
2019-04-01 12:30:23.000 30
2019-04-01 12:31:23.000 24
2019-04-01 12:32:23.000 0
Wien.MC
  • 55
  • 1
  • 9
  • First of all, your `FROM` clause will cause a cartesian product, which is usually very bad. What are you trying to accomplish by using `@table` twice? – tgolisch Apr 17 '19 at 17:16
  • Yes, you are right, but I think, I need something like a self-Join: For every (grouped) timestamp in A I would like to count thouse rows in B which have an younger lastchanged (becaus the entry was already createt) but not a younger timestamp (because the data is pending, still on the way to the server, timestamp should be older than the currend). So somehow I need to compare every singl entry within timestamp with all other lastchanged- and timestamp-entrys. I guess that will only be possible by a cartesian product. But I might be totaly wrong. – Wien.MC Apr 17 '19 at 20:09
  • Please tag your question properly. `SQL` is just a language used by multiple dbms vendors. – Eric Apr 17 '19 at 21:16

2 Answers2

0

As a possible Answer to my own question I would like to show a fetch-Option, that works and produces the output I am looking for. But as I need to work on a Azure Datewarehouse, fetching is not possible in a native way. As I integrated my table as an external resource on a classic SQL-Server on Azure and use the fetching that way, it needs 9 minutes for 80k rows and 391 distinct timestamps within that rows. So this is just not a good option.

But, a the working solution looks like (I focused on one day range an one ID only to reduce amount of data.)

CREATE TABLE #st 
(
    [timestamp]   DATETIME,
    CountPending  INT
);

DECLARE
    @timestamp      DATETIME,
    @CountPending   INT = 0,
    @ID             INT,
    @Start          DATETIME,
    @Stop           DATETIME;

SET @troID = 127
SET @Start = CAST('2019-04-01T00:00:00' AS DATETIME)
SET @Stop = CAST('2019-04-02T00:00:00' AS DATETIME)

DECLARE c CURSOR
    LOCAL STATIC FORWARD_ONLY READ_ONLY
    FOR
    SELECT DISTINCT timestamp
    FROM test.SensorData
    WHERE @ID = @ID AND timestamp BETWEEN @Start AND @Stop;

OPEN c;

FETCH NEXT FROM c INTO @timestamp;

WHILE @@FETCH_STATUS = 0
BEGIN

    SET @CountPending = (
        SELECT COUNT(*)
        FROM test.SensorData
        WHERE troID = @troID AND 
            timestamp BETWEEN @Start AND @Stop AND 
            @timestamp < timestamp AND @timestamp > lastchanged)

    INSERT #st([timestamp], CountPending)
        SELECT @timestamp, @CountPending;

    FETCH NEXT FROM c INTO @timestamp;
END

CLOSE c;
DEALLOCATE c;

SELECT [timestamp], CountPending
    FROM #st
    ORDER BY [timestamp]; 
Wien.MC
  • 55
  • 1
  • 9
0

This should translate into a subquery, something like this:

SELECT a.troID, a.timestamp, MAX(x.c)
FROM test.SensorData a
    CROSS APPLY
        (
        SELECT COUNT(*) c
        FROM test.SensorData b
        WHERE a.troID = b.troID
          AND a.timestamp < b.timestamp 
          AND a.timestamp > b.lastchanged
        ) x
GROUP BY a.troID, a.timestamp

I would guess hashing on troId might help but would need to know more about your data, distribution and volumes.

wBob
  • 13,710
  • 3
  • 20
  • 37