0

I'm importing items from Cosmos db, however each item can potentially have a different schema structure. Below are a few examples of how they could look.

Ex.1

{
  "id": "01-000008",
  "pKey": "000008",
  "delete": false,
  "/": {
    "fwqs": false,
    "itemDescription": "Test Description one",
    "linkset": {
      "anchor": "Test anchor one",
      "itemDescription": "Test Description one",
      "https://test/testIdeas": [
        {
          "href": "https://test.com/test/",
          "title": "Where to test one",
          "fwqs": true,
          "public": true,
          "type": "text/html",
          "context": [
            "us"
          ],
          "hreflang": [
            "en"
          ]
        }
      ],
      "https://test/activityIdeas": [
        {
          "href": "https://test.com/test/",
          "title": "Where to buy",
          "fwqs": true,
          "public": true,
          "type": "text/html",
          "context": [
            "us"
          ],
          "hreflang": [
            "en"
          ]
        }
      ]
    }
  }
}

Ex.2

{
  "id": "01-000009",
  "pKey": "000009",
  "delete": false,
  "/10/101": {
    "fwqs": false,
    "itemDescription": "Test Description Two",
    "linkset": {
      "anchor": "Test anchor Two",
      "itemDescription": "Test Description Two",
      "https://test/testIdeas": [
        {
          "href": "https://test.com/test/",
          "title": "Where to test",
          "fwqs": true,
          "public": true,
          "type": "text/html",
          "context": [
            "us"
          ],
          "hreflang": [
            "en"
          ]
        }
      ]
    }
  }
}

Ex.3

{
  "id": "01-0000010",
  "pKey": "0000010",
  "delete": false,
  "/10/12345/AB/100001": {
    "fwqs": false,
    "itemDescription": "Test Description Three",
    "linkset": {
      "anchor": "Test anchor Three",
      "itemDescription": "Test Description Three",
      "https://test/testIdeas": [
        {
          "href": "https://test.com/test/",
          "title": "Where to test",
          "fwqs": true,
          "public": true,
          "type": "text/html",
          "context": [
            "us"
          ],
          "hreflang": [
            "en"
          ]
        }
      ]
    }
  }
}

I've made sure to clear the data set schema and the source projection in preparation to handle the dynamic key.

In Ex.1 in the unroll by I'm trying to access {/}.linkset.{https://test/testIdeas}. But because the "{/}" key is always changing I need to make this part dynamic somehow in order to also pick up the keys you find in Ex.2 & Ex.3.

I'm struggling to resolve the issue, I'm not sure what function should be used to identify a dynamic key and if it should be done in the "add dynamic content" or the "add column pattern" part of the "unroll by" option in the flatten activity.

desired results current results

flatten activity ideas

flatten test ideas

derived 'new_col' activity

Hoop
  • 3
  • 2

1 Answers1

0

One approach to solve this is renaming the dynamic key with a static name and then flattening that column. Since the key has the pattern starting with \, column which begins with \ can be renamed as new_column. Below are the detailed steps to rename the dynamic key.

  • One sample source JSON dataset is taken in source transformation. enter image description here

  • Then derived column transformation is taken to rename the column that starts with \ to new_col.

enter image description here

Dataflow script for derived column:

derive(each(match(startsWith(name,'/')), 'new_col'  =  $$)) ~> derivedColumn1
  • Then you can use flatten transformation and give the unroll by basis the new_column field.

enter image description here

By this way, flatten the Json with dynamic key can be flattened.

Aswin
  • 4,090
  • 2
  • 4
  • 16
  • thank you for your assistance. I'm having trouble processing multiple items (payloads) at a time with your fix. It works great when I process just the schema in EX.1, but when I process Ex.1 & Ex.2 at the same time it only picks up 1 attribute from the second payload. – Hoop May 17 '23 at 08:33
  • **but when I process Ex.1 & Ex.2 at the same time it only picks up 1 attribute from the second payload**.- What does it mean? In the question, you provided that source schema will be varying. You gave 3 such source sample data. – Aswin May 17 '23 at 09:06
  • Could you explain in detail? so that i will get clarified with the question and input data – Aswin May 17 '23 at 09:08
  • I've modified the example schemas, I've also added the desired results. During the first pipeline run I would like for all items in the cosmos container to be processed, for any one item I would like to split the item results between two SQL tables. During the second run (and with change feed activated) it will pull in only items that have been added or modified since the last run, this could be one item or multiple items. – Hoop May 17 '23 at 11:04
  • regarding **"but when I process Ex.1 & Ex.2 at the same time it only picks up 1 attribute from the second payload."** I couldn't achieve the desired results when I was testing your recommendation. – Hoop May 17 '23 at 11:12
  • Can you share the output you are getting ? – Aswin May 17 '23 at 11:21
  • I've added the output. In EX.1 you have '//test/testIdeas' & '//test/activityIdeas', but not all payloads will have this, some will just have '//test/testIdeas'. – Hoop May 17 '23 at 12:50
  • This is not issue in the approach . You are unrolling by the expression ` {/}.linkset.{https://test/testIdeas}` whereas `//test/activityIdeas',` is not unrolled. Could you give the proper expression in unroll by of flatten transformation and check – Aswin May 17 '23 at 13:00
  • I'm trying to unroll both testideas and activityIdeas from the same payload, based on the derived column 'new_col'. I've added screenshots. – Hoop May 17 '23 at 13:32
  • Note: {test/testIdeas}, {test/activityIdeas} are not rows for linkset. They are like columns of linkset. The approach that you used..unrolling by testideas only or activityideas only will not work. – Aswin May 17 '23 at 13:54
  • Based on the example schemas and the desired output, is there a way this can be achieved? – Hoop May 17 '23 at 15:06
  • You can ask new question in this platform for this with clear explanation of what is required. This would help the commuity to understand the requirements clearly. – Aswin May 18 '23 at 02:52
  • The initial version of question is `In Ex.1 in the unroll by I'm trying to access {/}.linkset.{https://test/testIdeas}. But because the "{/}" key is always changing I need to make this part dynamic somehow in order to also pick up the keys you find in Ex.2 & Ex.3`. As [this](https://stackoverflow.com/a/76252379/19986107) answers the question, you can raise the new question for new requirements. – Aswin May 18 '23 at 02:59