1

I have the following input (testing in the Azure portal) that I have uploaded:

[
  {"engineid":"engine001","eventtime":1,"tmp":19.3,"hum":0.22},
  {"engineid":"engine001","eventtime":2,"tmp":19.7,"hum":0.21},
  {"engineid":"engine002","eventtime":3,"tmp":20.4,"hum":0.25},
  {"engineid":"engine001","eventtime":4,"tmp":19.6,"hum":0.24}
]

Then I try to get the records group, so that I have the 2 last rows for every engine. As you can see in the sample, I only have 2 different engines, so I would expect an output with two records that each contain the ranked records, but I am getting 4 output records.

This is my query:

-- Taking relevant fields from the input stream
WITH RelevantTelemetry AS
(
    SELECT  engineid, tmp, hum, eventtime
    FROM    [engine-telemetry] 
    WHERE   engineid IS NOT NULL
),
-- Grouping by engineid in TimeWindows
TimeWindows AS
(
    SELECT engineid, 
        CollectTop(2) OVER (ORDER BY eventtime DESC) as TimeWindow
    FROM
        [RelevantTelemetry]
    WHERE engineid IS NOT NULL
    GROUP BY SlidingWindow(hour, 24), engineid
)
--Output timewindows for verification purposes
SELECT TimeWindow
INTO debug
FROM TimeWindows

I played with the TIMESTAMP BY property, changed the order of GROUP BY, etc, but still I keep having the following 4 records, rather than the 2 I was expecting:

Any idea?

[
{"TimeWindow":
  [
    {"rank":1,"value": "engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1}}
  ]},
{"TimeWindow":
  [
    {"rank":1,"value":{"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4}},
    {"rank":2,"value":{"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2}},
    {"rank":3,"value":{"engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1}}
  ]},
{"TimeWindow":
  [
    {"rank":1,"value":{"engineid":"engine002","tmp":0.0017,"hum":0.0003,"eventtime":3}}
  ]},
{"TimeWindow":
  [
    {"rank":1,"value":{"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4}},
    {"rank":2,"value":{"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2}}
  ]}
]
Sam Vanhoutte
  • 3,247
  • 27
  • 48

1 Answers1

1

As suggested by @SteveZhao, you need to use GROUP BY TumblingWindow(hour, 24), engineid instead of GROUP BY SlidingWindow(hour, 24), engineid

Sliding window can overlap entries based on time intervalenter image description here

For more information refer: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

Pranav Singh
  • 17,079
  • 30
  • 77
  • 104
  • Thanks, Pranav & Steve. That seems to drive better results. I thought sliding window would always keep the latest N records available and am afraid that TumblingWindow might not have N records, if they cross the boundary and a new Window has started... – Sam Vanhoutte Aug 10 '20 at 11:52