2

I have an Azure Stream Analytics job that uses an EventHub and a Reference data in Blob storage as 2 inputs. The reference data is CSV that looks something like this:

REGEX_PATTERN,FRIENDLY_NAME
115[1-2]{1}9,Name 1
115[3-9]{1}9,Name 2

I then need to lookup an attribute in the incoming event in EventHub against this CSV to get the FRIENDLY_NAME.

Typical way of of using reference data is using JOIN clause. But in this case I cannot use it because such regex matching is not supported with LIKE operator.

UDF is another option, but I cannot seem to find a way of using reference data as a CSV inside the function.

Is there any other way of doing this in an Azure Stream Analytics job?

user_name
  • 119
  • 1
  • 8
  • What is the join key between event hub and reference data? `REGEX_PATTERN`? – Jay Gong Apr 30 '20 at 07:17
  • @JayGong yes, it is the join key. – user_name Apr 30 '20 at 07:35
  • Understand,but i think the key should be specific, not regex value in ASA job. So it is your pain point, right? – Jay Gong Apr 30 '20 at 07:39
  • @JayGong Correct. Which is why I doubt a `JOIN` would work in my case. The alternative I could think was, was using an UDF, but was wondering if I can use the entire list of reference data in the UDF to perform the lookup. – user_name Apr 30 '20 at 07:44
  • As a last fallback I am thinking of just hardcoding the reference data in the UDF. But I need this working for several different ASA jobs that use different reference data. So things will get ugly there. – user_name Apr 30 '20 at 07:46

1 Answers1

1

As I know, the JOIN is not supported in your scenario. The join key should be specific, can't be a regex value.

Thus, reference data is not suitable here because it should be used in the ASA sql like below:

SELECT I1.EntryTime, I1.LicensePlate, I1.TollId, R.RegistrationId  
FROM Input1 I1 TIMESTAMP BY EntryTime  
JOIN Registration R  
ON I1.LicensePlate = R.LicensePlate  
WHERE R.Expired = '1'

The join key is needed. What I mean is that the reference data input is not needed even here.

Your idea is using UDF script and load the data in the UDF to compare with the hardcode regex data. This idea is not easy to maintain. Maybe you could consider my workaround:

1.You said you have different reference data,please group them and store as json array. Assign one group id to every group. For example:

Group Id 1:

[
    {
        "REGEX":"115[1-2]{1}9",
        "FRIENDLY_NAME":"Name 1"
    },
    {
        "REGEX":"115[3-9]{1}9",
        "FRIENDLY_NAME":"Name 2"
    }
]

....

2.Add one column to referring group id and set Azure Function as Output of your ASA SQL. Inside Azure Function, please accept the group id column and load the corresponding group of json array. Then loop the rows to match the regex and save the data into destination residence.

I think Azure Function is more flexible then UDF in ASA sql job. Additional,this solution is maybe easier to maintain.

halfer
  • 19,824
  • 17
  • 99
  • 186
Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • I get the general idea of this solution. The current ASA job outputs to another EvenHub, which now needs to be changed to an Azure Function, let the Azure function do the regex lookup work and then sink to the Event Hub using custom code. This would mean substantial architecture change to our jobs and I was looking for a simpler solution using just the ASA constructs. Thanks for the idea and will keep this mind if we don't find another way out of this. – user_name Apr 30 '20 at 11:27
  • 1
    @user_name Yeah! I understand this is maybe too reverse for you. You could keep it as a workaround. Thank you anyway. – Jay Gong Apr 30 '20 at 12:02