1

I want to split strings into columns.

My columns should be:

account_id, resource_type, resource_name

I have a JSON file source that I have been trying to parse via ADF data flow. That hasn't worked for me, hence I flattened the data and brought it into SQL Server (I am open to parsing values via ADF or SQL if anyone can show me how). Please check the JSON file at the bottom.

Use this code to query the data I am working with.

 CREATE TABLE test.test2
 (
     resource_type nvarchar(max) NULL
 )

 INSERT INTO test.test2 ([resource_type]) 
 VALUES 
     ('account_id:224526257458,resource_type:buckets,resource_name:camp-stage-artifactory'),
     ('account_id:535533456241,resource_type:buckets,resource_name:tni-prod-diva-backups'),
     ('account_id:369798452057,resource_type:buckets,resource_name:369798452057-s3-manifests'),
     ('account_id:460085747812,resource_type:buckets,resource_name:vessel-incident-report-nonprod-accesslogs')

The output that I should be able to query in SQL Server should like this:

account_id resource_type resource_name
224526257458 buckets camp-stage-artifactory
535533456241 buckets tni-prod-diva-backups

and so forth.

Please help me out and ask for clarification if needed. Thanks in advance.

EDIT:

Source JSON Format:

{
    "start_date": "2021-12-01 00:00:00+00:00",
    "end_date": "2021-12-31 23:59:59+00:00",
    "resource_type": "all",
    "records": [
        {
            "directconnect_connections": [
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fh40evn5'",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-ffxgf6kh'",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-fg5j5v6o'",
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fgvfo1ej'"
            ]
        },
        {
            "virtual_interfaces": [
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgvj25vt'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgbw5gs0'",
                "account_id:401311080156,resource_type:virtual_interfaces,resource_name:'dxvif-ffnosohr'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fg18bdhl'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffmf6h64'",
                "account_id:390251991779,resource_type:virtual_interfaces,resource_name:'dxvif-fgkxjhcj'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffp6kl3f'"
            ]
        }
    ]
}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
newbie
  • 53
  • 10
  • Let me guess, the column names need to be dynamically generated..? – Thom A Jan 24 '22 at 19:55
  • I am not sure I understand. Column Names are separated by a ':' in the query above. – newbie Jan 24 '22 at 19:56
  • A column's name must be well defined in your SQL, @newbie . Is your data ***always*** just these 3 columns? – Thom A Jan 24 '22 at 19:58
  • Yes, the data is always 3 columns. This is not likely to change. I get the data just like this from another department in a JSON file and have no control over its formatting. Just an FYI – newbie Jan 24 '22 at 20:00
  • Wait, this is actually coming from JSON? If so, why not directly parse the JSON; SQL Server natively supports JSON in all supported versions. – Thom A Jan 24 '22 at 20:01
  • Okay. I am super new to all this and don’t know if I could parse the JSON. Could you please guide me on how? I tried doing that in ADF but no luck. – newbie Jan 24 '22 at 20:03
  • [JSON data in SQL Server](https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15) The above isn't valid JSON, so we can't write an answer based on that. – Thom A Jan 24 '22 at 20:04
  • @dale k. Sure. I tried 2 things. Can edit the question to add those. – newbie Jan 24 '22 at 20:05
  • I can edit the question to add the JSON format? – newbie Jan 24 '22 at 20:07
  • Yes, adding the JSON would be helpful, @newbie . – Thom A Jan 24 '22 at 20:08
  • Please see now @Larnu – newbie Jan 24 '22 at 20:19
  • Put the JSON in your question, @newbie . – Thom A Jan 24 '22 at 20:20
  • Please see now @larnu . – newbie Jan 24 '22 at 20:27
  • Thanks for the edit @larnu . What would be your recommended solution now – newbie Jan 24 '22 at 20:32
  • Oh, so the value inside the JSON isn't even properly formatted..? It's all in a single set of double quotes. I would have *assumed* a single `record` would have set of values like `"account_id":"227148359287","resource_type":"virtual_interfaces,resource_name":"dxvif-ffp6kl3f"`. Then see John's answer, as it's *not* JSON. – Thom A Jan 24 '22 at 20:34
  • You are correct! It should be formatted with quotes either before or after delimiters. But it is not. Its been fetched using a REST API in ADF. Would you want me to insert snippets of how it is being done? – newbie Jan 24 '22 at 20:36
  • No, if the JSON isn't actual JSON, then you simply want the answer from John below. You can't consume invalid JSON like JSON. – Thom A Jan 24 '22 at 20:38

2 Answers2

0

Since you don't have a valid JSON string and not wanting to get in the business of string manipulation... perhaps this will help.

Select B.*
 From  test2 A
 Cross Apply ( Select account_id    = max(case when value like 'account_id:%'    then stuff(value,1,11,'') end )
                     ,resource_type = max(case when value like 'resource_type:%' then stuff(value,1,14,'') end )
                     ,resource_name = max(case when value like 'resource_name:%' then stuff(value,1,14,'') end )
                from  string_split(resource_type,',') 
             )B

Results

account_id      resource_type   resource_name
224526257458    buckets         camp-stage-artifactory
535533456241    buckets         tni-prod-diva-backups
369798452057    buckets         369798452057-s3-manifests
460085747812    buckets         vessel-incident-report-nonprod-accesslogs
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
0

Unfortunately, the values inside the arrays are not valid JSON. You can patch them up by adding {} to the beginning/end, and adding " on either side of : and ,.

DECLARE @json nvarchar(max) = N'{
    "start_date": "2021-12-01 00:00:00+00:00",
    "end_date": "2021-12-31 23:59:59+00:00",
    "resource_type": "all",
    "records": [
        {
            "directconnect_connections": [
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fh40evn5''",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-ffxgf6kh''",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-fg5j5v6o''",
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fgvfo1ej''"
            ]
        },
        {
            "virtual_interfaces": [
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgvj25vt''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgbw5gs0''",
                "account_id:401311080156,resource_type:virtual_interfaces,resource_name:''dxvif-ffnosohr''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fg18bdhl''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffmf6h64''",
                "account_id:390251991779,resource_type:virtual_interfaces,resource_name:''dxvif-fgkxjhcj''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffp6kl3f''"
            ]
        }
    ]
}';


SELECT
  j4.account_id,
  j4.resource_type,
  TRIM('''' FROM j4.resource_name) resource_name
FROM OPENJSON(@json, '$.records') j1
CROSS APPLY OPENJSON(j1.value) j2
CROSS APPLY OPENJSON(j2.value) j3
CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(j3.value, ':', '":"'), ',', '","') + '"}')
  WITH (
    account_id bigint,
    resource_type varchar(20),
    resource_name varchar(100)
  ) j4;

db<>fiddle

The first three calls to OPENJSON have no schema, so the resultset is three columns: key value and type. In the case of arrays (j1 and j3), key is the index into the array. In the case of single objects (j2), key is each property name.

Charlieface
  • 52,284
  • 6
  • 19
  • 43