0

I have a requirement to convert the json into csv(or a SQL table) or any other flatten structure using Data Flow in Azure Data Factory. I need to take the property names at some hierarchy and values of the child properties at lower of hierrarchy from the source json and add them both as column/row values in csv or any other flatten structure.

Source Data Rules/Constraints :

  1. Parent level data property names will change dynamically (e.g. ABCDataPoints,CementUse, CoalUse, ABCUseIndicators names are dynamic)
  2. The hierarchy always remains same as in below sample json.

I need some help in defining Json path/expression to get the names ABCDataPoints,CementUse, CoalUse, ABCUseIndicators etc. I am able to figure out how to retrieve the values for the properties Value,ValueDate,ValueScore,AsReported.

Source Data Structure :

{
"ABCDataPoints": {
    "CementUse": {
        "Value": null,
        "ValueDate": null,
        "ValueScore": null,
        "AsReported": [],
        "Sources": []
    },
    "CoalUse": {
        "Value": null,
        "ValueDate": null,
        "AsReported": [],
        "Sources": []
    }
},
"ABCUseIndicators": {
    "EnvironmentalControversies": {
        "Value": false,
        "ValueDate": "2021-03-06T23:22:49.870Z"
    },
    "RenewableEnergyUseRatio": {
        "Value": null,
        "ValueDate": null,
        "ValueScore": null
    }
},
"XYZDataPoints": {
    "AccountingControversiesCount": {
        "Value": null,
        "ValueDate": null,
        "AsReported": [],
        "Sources": []
    },
    "AdvanceNotices": {
        "Value": null,
        "ValueDate": null,
        "Sources": []
    }        
},
"XYXIndicators": {
    "AccountingControversies": {
        "Value": false,
        "ValueDate": "2021-03-06T23:22:49.870Z"
    },
    "AntiTakeoverDevicesAboveTwo": {
        "Value": 4,
        "ValueDate": "2021-03-06T23:22:49.870Z",
        "ValueScore": "0.8351945854483925"
    }     
}

}

Expected Flatten structure enter image description here

  • I'm afraid that we can not parse dynamic keys JSON object. If keys are fixed, we can use OPENJSON or JSON_VALUE in Azure sql to achieve that. – Joseph Xu May 07 '21 at 03:04

2 Answers2

1

Background: After having multiple calls with ADF experts at Microsoft(Our workplace have Microsoft/Azure partnership), they concluded this is not possible with out of the box activities provided by ADF as is, neither by Dataflow(need not to use data flow though) nor Flatten feature. Reasons are Dataflow/Flatten only unroll the Array objects and there are no mapping functions available to pick the property names - Custom expression are in internal beta testing and will in PA in near future.

Conclusion/Solution: We concluded with an agreement based on calls with Microsoft emps ended up to go multiple approaches but both needs the custom code - with out custom code this is not possible by using out of box activities.

Solution-1 : Use some code to flatten as per requirement using a ADF Custom Activity. The downside of this you need to use an external compute(VM/Batch), the options supported are not on-demand. So it is little bit expensive but works best if have continuous stream workloads. This approach also continuously monitor if input sources are of different sizes because the compute needs to be elastic in this case or else you will get out of memory exceptions.

Solution-2 : Still needs to write the custom code - but in a function app.

  • Create a Copy Activity with source as the files with Json content(preferably storage account).
  • Use target as Rest Endpoint of function(Not as a function activity because it has 90sec timeout when called from an ADF activity)
  • The function app will takes Json lines as input and parse and flatten.
  • If you use the above way so you can scale the number of lines cane be send in each request to function and also scale the parallel requests.
  • The function will do the flatten as required to one file or multiple files and store in blob storage.
  • The pipeline will continue from there as needed from there.
  • One problem with this approach is if any of the range is failed the copy activity will retry but it will run the whole process again.
0

Trying something very similar, is there any other / native solution to address this?

As mentioned in the response above, has this been GA yet? If yes, any reference documentation / samples would be of great help!

Custom expression are in internal beta testing and will in PA in near future.

Bhushan
  • 580
  • 6
  • 19