2

I am trying to call an AzureML UDF from Stream Analytics query and that UDF expects an array of 5 rows and 2 columns. The input data is streamed from an IoT hub and we have two fields in the incoming messages: temperature & humidity.

This would be the 'passthrough query' :

SELECT GetMetadataPropertyValue([room-telemetry], 'IoTHub.ConnectionDeviceId') AS RoomId, 
       Temperature, Humidity
INTO
    [maintenance-alerts]
FROM
    [room-telemetry]

I have an AzureML UDF (successfully created) that should be called with the last 5 records per RoomId and that will return one value from the ML Model. Obviously, there are multiple rooms in my stream, so I need to find a way to get some kind of windowing of 5 records Grouped per RoomId. I don't seem to find a way to call the UDF with the right arrays selected from the input stream. I know I can create a Javascript UDF that would return an array from the specific fields, but that would be record/by record, where here I would need this with multiple records that are grouped by the RoomId.

Someone has any insights?

Best regards

Sam Vanhoutte
  • 3,247
  • 27
  • 48
  • Somewhat similar question came up last week. I asked around and it seems there's VERY few people who know both AML and ASL enough to answer this question. My advice would be to make it either more ASA or AML-oriented by paring down the references to the other. https://stackoverflow.com/questions/62995906/azure-stream-analytics-ml-service-function-call-in-cloud-job-results-in-no-outp – Anders Swanson Jul 31 '20 at 06:59
  • I tend to agree. it feels a bit complex to get it done, but I'll check if I get it solved anyhow. The question you pointed out at least seems to allow passing everything to a UDF (that Sequence variable), might check that out – Sam Vanhoutte Jul 31 '20 at 11:40
  • Hi, just to make sure of the logic, you need to call the ML function with the last 5 records for each room? Did you try COLLECTTOP, using the roomId in the group by? https://learn.microsoft.com/en-us/stream-analytics-query/collecttop-azure-stream-analytics – Jean-Sébastien Jul 31 '20 at 17:18
  • Hello JS, thanks for getting back. Yes, you got the logic right. CollectTop seems to be like the right aggregation function, indeed, but I don't seem to find good information on how to transform the returned fields (with rank/value) to a javascript array (just want a n*m array with the values of those records to be passed to the AzureML UDF) – Sam Vanhoutte Aug 04 '20 at 09:45

1 Answers1

1

After the good suggestion of @jean-sébastien and an answer to an isolated question for the array-parsing, I finally was able to stitch everything together in a solution that builds. (still have to get it to run at runtime, though).

So, the solution exists in using CollectTop to aggregate the latest rows of the entity you want to group by, including the specification of a Time Window.

And the next step was to create the javascript UDF to take that data structure and parse it into a multi-dimensional array.

This is the query I have right now:

-- Taking relevant fields from the input stream
WITH RelevantTelemetry AS
(
    SELECT  engineid, tmp, hum, eventtime
    FROM    [engine-telemetry] 
    WHERE   engineid IS NOT NULL
),
-- Grouping by engineid in TimeWindows
TimeWindows AS
(
    SELECT engineid, 
        CollectTop(2) OVER (ORDER BY eventtime DESC) as TimeWindow
    FROM
        [RelevantTelemetry]
    WHERE engineid IS NOT NULL
    GROUP BY TumblingWindow(hour, 24), engineid
)
--Output timewindows for verification purposes
SELECT engineid, Udf.Predict(Udf.getTimeWindows(TimeWindow)) as Prediction
INTO debug
FROM TimeWindows

And this is the Javascript UDF:

    function getTimeWindows(input){
        var output = [];
        for(var x in input){
            var array = [];
            array.push(input[x].value.tmp);
            array.push(input[x].value.hum);
            output.push(array);
        }
        return output;
    }
Sam Vanhoutte
  • 3,247
  • 27
  • 48