2

I am trying to do a join in stream analytics with a reference data.

Below is the input from stream data.

[{
               "id":"111111101",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"470A01",
                                             "id":1
                              },
                              {
                                             "xxx":0,
                                             "yyy":0,
                                             "aaa":"000000",
                                             "id":61
                              }
               ]

},
{
               "id":"111111102",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"03F4EB",
                                             "id":1
                              }
               ]
},
{
               "id":"111111103",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"6706",
                                             "id":1
                              }
              ]
}
]

Below is the reference Master data.

[
{
    "aaa": "470A01"
  },
  {
    "aaa": "03F4EB"
  },
  {
    "aaa": "710211"
  }
]

The SAQL written is as shown below.

WITH INPUT1 AS (
    SELECT
    input.id.dateTime AS ID,
    flatArrayElement as ABC,
    FROM [signals2] as input
    CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
    )

I have used CROSS APPLY to get each element in xyz as different rows.

The output of INPUT1 is as shown below.

+----------+------------------------------------------------------------------------+
|       ID |   ABC                                                                  |
+----------+------------------------------------------------------------------------+
| 111111101| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} |
| 111111101| {"ArrayValue":{"xxx":0,"yyy":0,"aaa":000000,"id":61},"ArrayIndex":1}   |
| 111111102| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} |
| 111111103| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":6706,"id":1},"ArrayIndex":0}   |
+-------------------+---------------------------------------------------------------+

Now I am trying to join the data xyz.aaa with a reference data as below, where master is the reference data.

SIGNALS AS (
 SELECT * FROM INPUT1 I JOIN master M ON I.ABC.ArrayValue.aaa = M.aaa

I am getting below output, but the problem is xyz with more than one element is duplicating in the output.

+-------------------------------+------------------------------------------------------------------------------------------+--------+
| i___timestamp                 |   i                                                                                      |  m     |     
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111102",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} | 03F4EB |
+-------------------------------+------------------------------------------------------------------------------------------+--------+

I am confused why the first two rows are duplicating,it should be only one entry for that. Out of two elements in xyz one is valid and one is invalid. But here the valid element is repeating two times. What might be the reason? How to resolve this ?

Antony
  • 970
  • 3
  • 20
  • 46

1 Answers1

1

Please see my query sql:

WITH INPUT1 AS (
    SELECT
        input.name as name,
        flatArrayElement as ABC
    FROM
        [YourInputAlias] as input
        CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)

SELECT INPUT1.ABC.ArrayValue.aaa FROM INPUT1
JOIN jayrefer on INPUT1.ABC.ArrayValue.aaa = jayrefer.item

jayrefer is your Master Reference:

enter image description here

input is your Input:

enter image description here

Output:

enter image description here


Just for summary,the issue is caused by the duplicate raws in the reference data so that the final results is duplicate.

The final sql is:

WITH INPUT AS (
    SELECT
    input1.id.dateTime AS ID,
    flatArrayElement as ABC
    FROM input1
    CROSS APPLY GetArrayElements(input1.xyz) AS flatArrayElement
)
SELECT * FROM INPUT I JOIN jayrefer M ON I.ABC.ArrayValue.aaa = M.aaa
Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • -this is working till an extend. Only one issue is if the data is in ```"xyz": [ {"id":1,"aaa":2631}, {"id":2,"aaa":3633} ]``` format and only aaa: 2631 exists in master data then aaa:2631 repeats 2 times, which is not expected. – Antony Nov 11 '19 at 07:19
  • @Antony I can't see aaa:2631 repeats 2 times, could you please share your sql? – Jay Gong Nov 11 '19 at 08:22
  • Below is the query ```WITH INPUT1 AS ( SELECT input.id AS ID, flatArrayElement as ABC FROM [signals2] as input CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement ), SIGNALS AS ( SELECT I.ID,I.ABC.ArrayValue.aaa FROM INPUT1 I JOIN dtcmaster M ON I.ABC.ArrayValue.aaa = M.AAA ) select * into outputTest from INPUT1 select * into outputTest01 from SIGNALS``` @Jay Gong – Antony Nov 12 '19 at 10:59
  • Input is not as you have shown in the answer. For each aaa it will be different row as shown in the question. But in your sample the input xyz is having both aaa in a single line. In the input given by u, aaa column shouldn't be there. – Antony Nov 12 '19 at 11:21
  • @Antony Sorry,your sql can't be passed through in the ASA portal,you could post append it in your question which maybe more clear. BTW,i don't understand you said my `xyz` is having both aaa in a single line. My `xyz` is an array which contains 2 objects,each object has 2 properties :`id` and `aaa`.It's not match with your data,am i right? However,i did followed your `xyz` format in your question. Maybe i made a mistake here? i'm confused... – Jay Gong Nov 13 '19 at 02:32
  • @jayGong-I have updated the question with sample and reference data a dn with exact issue i am facing. – Antony Nov 13 '19 at 05:22
  • @Antony I already summarized the issue at the end of my answer.You could end it,thanks. – Jay Gong Nov 13 '19 at 07:29
  • I took master data from CosmosDB by running distinct query but it seems like it has duplicate values. Thanks for your time and effort.@Jay Gong – Antony Nov 13 '19 at 07:52
  • @Antony I'm familiar with cosmos, would you please post some detailed info? Or create a new case? – Jay Gong Nov 13 '19 at 07:54