2

I try to get my head around the hopping window in azure stream analytics. I'll get the following data from an Azure Event Hub:

[
  {
    "Id": "1",
    "SensorData": [
      {
        "Timestamp": 1603112431,
        "Type": "LineCrossing",
        "Direction": "forward"
      },
      {
        "Timestamp": 1603112431,
        "Type": "LineCrossing",
        "Direction": "forward"
      }
    ],
    "EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
    "PartitionId": 1,
    "EventEnqueuedUtcTime": "2020-10-20T06:35:48.3540000Z"
  },
  {
    "Id": "1",
    "SensorData": [
      {
        "Timestamp": 1603112430,
        "Type": "LineCrossing",
        "Direction": "backward"
      }
    ],
    "EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
    "PartitionId": 0,
    "EventEnqueuedUtcTime": "2020-10-20T06:35:48.2140000Z"
  }
]

My query looks like the following:

SELECT s.Id, COUNT(data.ArrayValue.Direction) as Count
FROM [customers] s TIMESTAMP BY EventEnqueuedUtcTime
CROSS APPLY GetArrayElements(s.SensorData) AS data
WHERE data.ArrayValue.Type = 'LineCrossing' 
AND data.ArrayValue.Direction = 'forward'
GROUP BY s.Id, HoppingWindow(second, 3600, 5)

I used a Hopping Window to get every 5th second all events from the last day. My expectation for the given dto would be: One row with Id1 and Count 2, but what I receive is: 720 rows (so 3600 divided by 5) with Id1 has Count 2.

Shouldn't those events not be aggregated by the HoppingWindow function?

Link
  • 1,307
  • 1
  • 11
  • 23
  • Are you using local inputs for testing? – kgalic Oct 20 '20 at 09:17
  • I just pushed some events to the event hub and sampled that data in the stream analytics platform – Link Oct 20 '20 at 09:34
  • 1
    I think the problem was the local input, as it will simulate the complete window - hence the reason why you see 3600/5 events. Below I answered and tested with the cloud input, with some changed parameters, and it should work according to my understanding of your expectation :) – kgalic Oct 20 '20 at 09:43
  • I didn't know the query window is simulating the whole timespan. – Link Oct 20 '20 at 11:47
  • No, I am saying if you select local input for testing the query - I experienced the same as you. But after I switched to cloud input for testing(getting events from Event hub), it worked as expected with the query below. – kgalic Oct 20 '20 at 11:48
  • 2
    I confirm the difference with local testing (or testing from sample) and live processing is the fact that in testing we assume that we have all events from the beginning to the end of time, and ASA will return every windows at once. In addition to the answer from kgalic, I can also suggest to add the timestamp of the windows so you can see they were generated one by one as time progressed: SELECT ... , System.Timestamp() as time_of_end_of_the_window – Jean-Sébastien Oct 20 '20 at 22:32

1 Answers1

2

I structured your query as it follows:

with inputValues as (Select input.*, message.ArrayValue as Data from input CROSS APPLY GetArrayElements(input.SensorData) as message)

select inputValues.Id, count(Data.Direction) as Count
into output
from inputValues 
where Data.Type = 'LineCrossing' and Data.Direction='forward'
GROUP BY inputValues.Id, HoppingWindow(second, 3600, 5)

I have set the input to Event Hub, and in the Visual Studio I have started a query with the cloud input.

I used a Windows Client application to pipe in the messages to Event Hub(2. from the picture below) and observed that events were coming every 5 seconds(1. from the picture below and 3. from the picture below).

Maybe just change the query I shared to reflect the correct time-stamping, but the result should be as expected - every 5 seconds count to the output per the defined condition for all events that arrived in the last hour(3600 seconds in the HoppingWindow function).

enter image description here

kgalic
  • 2,441
  • 1
  • 9
  • 21