2

I have a nested json that I am trying to flatten in usql. I cannot share the data, but the structure is similar to this.

{
    "userlist": [user1, user1],
    "objects": {
        "largeobjects": [object1, object2, object3]
        "smallobjects": [s_object1, s_object2]
    },
    "applications": [{
            "application": sdq3ds5dsa
        }, {
            "application": dksah122j4
        }, {
            "application": sadsw2dq2s
        }, {
            "application": pro3dfdsn3
        }
    ],
    "date" : 12344232,
    "timezone" : "Asia",
    "id" : "sad2ddssa2",
    "admin": {
        "lang": "eng",
        "country": "us",
    }
}

I am using the custom jsonoutputter (https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats) to extract from the json file and the jsontuple function to extract the values. My problem is that the function uses sql map which generates key value pairs. This works for the situations where I have a key, but it throws an error when I try to use that function to get the values from the no key array.

Any suggestion regarding how to solve this would be greatly appreciated.

EDIT This is the output I am looking after:

sad2ddssa2, object1, 12344232, "Asia", "eng", "us",
sad2ddssa2, object2, 12344232, "Asia", "eng", "us"

2 Answers2

1

First option

Try to use PROSE within your u-sql. Use the PROSE's c# nuget to process data and do complex extractions. This is a very powerful AI package. See the videos and examples here: https://microsoft.github.io/prose

Second option

Create a c# function to process your json. Something like this one, adapt this sample to your custom extraction request using c# json api's:

/* Formats the array of values into a named json array. */

DECLARE @JsonArray Func<SqlArray<string>, string, string> = (data, name) => 
    {
        StringBuilder buffer = new StringBuilder();

        buffer.Append("{\r\n\t\"" + name + "\": [\r\n");

        for (int i = 0; i < data.Count(); i++)
        {
            if (i > 0)
            {
                buffer.Append(",\r\n");
            }

            buffer.Append("\t\"" + data[i] + "\"");
        }

        buffer.Append("\r\n\t]\r\n}");

        return buffer.ToString();
    };

/* Format the array containing groups of comma separated values into a named json array */

@Query = 
    SELECT
        @JsonArray(SubscriptionArray, "subscriptionList") AS JsonArray
    FROM @subscriptionsQuery1;

Third option

Try this approach, after adapting it to your needs:

/* For each json line create a json map (SqlMap) */

@subscriptionsQuery1 = 
    SELECT 
           JsonFunctions.JsonTuple(JsonLine) AS JsonMap
    FROM @SubscriptionsExtractor AS t;

/* For each json map get the required property value */

@subscriptionsQuery1 = 
    SELECT DISTINCT
           JsonMap["alias"] AS Subscription
    FROM @subscriptionsQuery1 AS t;

/* Join the value of all rows into a single row containing an array of all values */

@subscriptionsQuery1 = 
    SELECT
        ARRAY_AGG<string>(Subscription) AS SubscriptionArray
    FROM @subscriptionsQuery1 AS t;
Miguel Domingues
  • 440
  • 3
  • 11
  • Thank you, PROSE looks very interesting for some tasks I have later on. The third aproach is what I tried to do, but json tuple expects a sql.map, and my array only has values no keys and produces an sql.array which is not compatible, hence my issues. I ended up writing custom code that does the flatenning. I am sure it's not the most elegant solution, and perhaps I could have figured out how to use the existing function, but it works. I will accept the answer however because PROSE will prove to be useful to me in the future. – Cristian Iosub Aug 23 '18 at 09:28
  • You can also try the second approach. Parse your json string with Json.NET api and you don't need to flat the file. – Miguel Domingues Aug 23 '18 at 13:35
  • Interesting approaches @MiguelDomingues +10. – wBob Aug 23 '18 at 23:31
1

I was able to get this to work using the NewtonSoft MultiLevelJsonExtractor extractor and this fixed-up JSON file:

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

DECLARE @inputFile string = @"\input\yourInputJSON.json";
DECLARE @outputFile string = @"\output\output.csv";


@input =
    EXTRACT id string,
            largeobjects string,
            date string,
            timezone string,
            lang string,
            country string
    FROM @inputFile
    USING new MultiLevelJsonExtractor("objects", false,
          "id",
          "largeobjects",
          "date",
          "timezone",
          "admin.lang",
          "admin.country"
          );

// Convert the JSON column to SQL MAP to multiple rows
@working =
    SELECT id,
           JsonFunctions.JsonTuple(largeobjects).Values AS largeobject,
           date,
           timezone,
           lang,
           country
    FROM @input;


// Explode the JSON SQL MAP
@output =
    SELECT id,
           x.y AS largeobject,
           date,
           timezone,
           lang,
           country
    FROM @working
         CROSS APPLY
             EXPLODE(largeobject) AS x(y);


OUTPUT @output
TO @outputFile
USING Outputters.Csv(quoting : false);

My results:

My results

I would say this is probably a bit safer than using a roll-your-own method as the NewtonSoft library is specifically for manipulating JSON and is tried and tested.

wBob
  • 13,710
  • 3
  • 20
  • 37