I'm working on a system of temperature and pressure sensors, where my data is flowing through a Stream analytics job. Now there maybe duplicate messages sent in because of acknowledgements not being received and various other reasons. So my data could be of the format:-
DeviceID TimeStamp MeasurementName Value
1 1 temperature 50
1 1 temperature 50
1 2 temperature 60
Note that the 2nd record is a duplicate of the 1st one as DeviceId and Timestamp and MeasurementName are same. I wish to take an average over 5 min tumbling window for this data in the stream analytics job. So I have this query
SELECT
AVG(Value)
FROM
SensorData
GROUP BY
DeviceId,
MeasurementName,
TumblingWindow(minute, 5)
This query is expected to give me average measurement of temperature and pressure values for each device in 5 min. In doing this average I need to eliminate duplicates. The actual average is (50+60)/2 = 55. But the average given my this query will be (50+50+60)/3 = 53.33
How do I tweak this query for the right output?
Thanks in advance.