1

I am receiving data from some sensors trough a few different bridges. The data i receive contains a lot of duplicates. With the same serialNo, values, (almost) same datetime etc, but from different bridges. The data don't include some kind of unique eventId, just only a timestamp that is unique for every single event, even if is duplicated. Therefore i cannot filter on them.

Here is an example:

{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1583750353969,"dateTime":"2020-03-09T10:39:13Z","serialNo":"02001703","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"AE8B2FC5","rssi":-25,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":15.8,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":39,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001703","vif":7,"dif":27,"rssiWmbus":-94,"EventProcessedUtcTime":"2020-03-09T11:54:07.5197619Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-09T10:39:14.0440000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1583750354377,"dateTime":"2020-03-09T10:39:14Z","serialNo":"02001703","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"01000000","rssi":-35,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":15.8,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":39,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001703","vif":7,"dif":27,"rssiWmbus":-80,"EventProcessedUtcTime":"2020-03-09T11:54:07.5197619Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-09T10:39:14.4190000Z"}

Is it some way of filter away the duplicates in Stream Analytics? The data is also eventually going to Power BI, if there is a possibility of doing it there. But when using the "remove duplicates" in Power Bi you need a sort of EventId that is unique from everything else, but the same for the duplicated data.

Thanks in advance!

skh
  • 27
  • 4

2 Answers2

0

According to your description,you just want to implement distinct feature which is similar to relational db feature so that you could filter some rows based on some columns.

Actually,that could be supported with some limitations in ASA. The main idea is using COUNT and GROUP BY key words.

For example, my test data as below:

enter image description here

SQL:

SELECT COUNT(DISTINCT b.timestamp),b.dsType,b.mrfCuId FROM blobstream b GROUP BY b.dsType,b.mrfCuId,TumblingWindow(minute, 5)

Output:

enter image description here

I got some clues from this official example.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • Thanks! So if i understand you correctly, with this query you select one of the rows with the same mrfcuid and ds type within a window of 5 minutes? Because almost of all data have the dstype = "WMBUS" and Mrfcuid ="b827EBE84EEB". I am interested to seperate where the serialnumber is the same, so i can just group by that and have a tubling window with just 2-3 seconds? – skh Mar 16 '20 at 11:37
  • I tried to change the query to using the serialNo and a tubling window for 2 seconds and just got "null" in return. A other problem is i dont quite know how to incorporate the other query i made, that seperates the different elements in a array: SELECT      event.serialNo,     CAST(event.dateTime as datetime) as TimeAndDate,     DataP.ArrayValue.name,     event.bridgeId,     DataP.ArrayValue.value,     DataP.ArrayValue.valueTYpe INTO     TestOuput FROM     eventhubInput AS event CROSS APPLY GetArrayElements(event.datapoint) AS DataP WHERE DataP.ArrayValue.valueType = 'CSV' – skh Mar 16 '20 at 14:05
  • @skh Hi,i'm confused that you got null value from the execution of job or the test process on the azure portal UI? – Jay Gong Mar 17 '20 at 09:28
  • @skh Or maybe you could post some sample data with the ASQL you provided in the last comment in your question so that i could test it on my side. – Jay Gong Mar 17 '20 at 09:29
  • Thank you very much! I have posted the sample data in an answer below since i could not post an image in this comment :) – skh Mar 17 '20 at 11:54
  • Were you able to see the sample data i posted as a answer? :) – skh Mar 18 '20 at 16:31
  • @skh Sure,i will take my time to view it.Trying my best to follow this case. – Jay Gong Mar 19 '20 at 02:54
0

I could not post a picture in the comment, so write my answer here instead.

This is my output result when running the query I posted in my comment to you. Here you can see that I have extracted some of the wanted values from the array in every row. And as you can see here, row 3 & 4 is exactly the same as row 1 & 2, just from different bridges. Same with row 7&8 and 9&10. So ideally I want just one sample of the correct data and not duplicated as in this example, if you understand.

Here are some more sample data if you wanted to test yourself:

{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355883141,"dateTime":"2020-03-16T10:51:23Z","serialNo":"02001771","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"5D410D00","rssi":-67,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":18.2,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":28,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001771","vif":7,"dif":27,"rssiWmbus":-16,"EventProcessedUtcTime":"2020-03-16T10:51:23.2682714Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:23.2420000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355898659,"dateTime":"2020-03-16T10:51:38Z","serialNo":"02001596","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"AE8B2FC5","rssi":-24,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":13.1,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":35,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001596","vif":7,"dif":27,"rssiWmbus":-45,"EventProcessedUtcTime":"2020-03-16T10:51:38.8337473Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:38.7330000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355898715,"dateTime":"2020-03-16T10:51:38Z","serialNo":"02001596","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"5D410D00","rssi":-67,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":13.1,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":35,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001596","vif":7,"dif":27,"rssiWmbus":-16,"EventProcessedUtcTime":"2020-03-16T10:51:38.8337473Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:38.8110000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355904394,"dateTime":"2020-03-16T10:51:44Z","serialNo":"02001704","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"AE8B2FC5","rssi":-24,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":19.2,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":26,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001704","vif":7,"dif":27,"rssiWmbus":-58,"EventProcessedUtcTime":"2020-03-16T10:51:44.5783305Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:44.4680000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355904737,"dateTime":"2020-03-16T10:51:44Z","serialNo":"02001704","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"5D410D00","rssi":-67,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":19.2,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":26,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001704","vif":7,"dif":27,"rssiWmbus":-16,"EventProcessedUtcTime":"2020-03-16T10:51:44.9080895Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:44.7960000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355907295,"dateTime":"2020-03-16T10:51:47Z","serialNo":"02001701","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"AE8B2FC5","rssi":-24,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":16.2,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":28,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001701","vif":7,"dif":27,"rssiWmbus":-86,"EventProcessedUtcTime":"2020-03-16T10:51:47.4262897Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:47.3750000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355908044,"dateTime":"2020-03-16T10:51:48Z","serialNo":"02001701","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"5D410D00","rssi":-67,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":16.2,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":28,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001701","vif":7,"dif":27,"rssiWmbus":-16,"EventProcessedUtcTime":"2020-03-16T10:51:48.1936261Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:48.1250000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1584355918798,"dateTime":"2020-03-16T10:51:58Z","serialNo":"02001701","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"AE8B2FC5","rssi":-24,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":16.2,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":28,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001701","vif":7,"dif":27,"rssiWmbus":-92,"EventProcessedUtcTime":"2020-03-16T10:51:58.9619079Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-16T10:51:58.8610000Z"}
double-beep
  • 5,031
  • 17
  • 33
  • 41
skh
  • 27
  • 4