0

I have a nested json data structure like below, end goal was to show the data in Quicksight from Athena. After researching I found out Quicksight cannot show/handle "ARRAY" datatype for visuals.

CREATE EXTERNAL TABLE `testfindings`(  ` 
 `source` string COMMENT 'from deserializer', 
 `account` string COMMENT 'from deserializer',  
 `detail` struct<findings:array<struct<productarn:string,
 resources:array<struct<partition:string,type:string,region:string,id:string>>>>> COMMENT 'from deserializer')
    
    ROW FORMAT SERDE 
      'org.openx.data.jsonserde.JsonSerDe' 

After writing the UNNEST query in Athena to extract id, type from resources array when i try to create dataset in quicksight , i am not able to find the "ARRAY"data only "source", "account" data is shown in quicksight.

After doing some research, I found out that "ARRAY" data is not supported in quicksight so i changed my table schema as below:

CREATE EXTERNAL TABLE `testfindings`(  ` 
  `source` string COMMENT 'from deserializer', 
  `account` string COMMENT 'from deserializer',  
  `detail` string COMMENT 'from deserializer')

ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe'

Now when I do select * from testfindings , i get data in three columns and last column detail has all the data in the array form.

Now, how I can get particular fields from detail column like i wanted to get id, type from resources which are present inside resources array.

`detail` struct<findings:array<struct<productarn:string,
 resources:array<struct<partition:string,type:string,region:string,id:string>>>

I am trying something like this :

select 
source,
account,
resource
from "testfindings" 
cross join UNNEST(testfindings.detail) as p(tr)
cross join UNNEST(tr.findings) as p(it)
cross join UNNEST(it.resources) as p(resource)

It gives me below error:

INVALID_FUNCTION_ARGUMENT: Cannot unnest type: varchar

Any help is appreciated.

Bokambo
  • 4,204
  • 27
  • 79
  • 130

1 Answers1

0

Because you've cast the array as a string, Athena doesn't know how to deal with the contents of your column.

To fix this, you can use the presto function json_extract, which will parse the data as json/dict, and allow you to access that array, or nested contents within.

from the docs:

SELECT json_extract( my_json_column, '$.store.book') -- the json path is the $.x.y

with your data this would be:

select 
source,
account,
resource
from "testfindings" 
cross join UNNEST(json_extract(testfindings.detail, '$') ) as p(tr) -- assuming this is the only table you need to unnest from struct
cross join UNNEST(tr.findings) as p(it)
cross join UNNEST(it.resources) as p(resource)

Now, that said, what I think your actual issue might be is the compatibility with Quicksight parsing structured data.

To address that issue, you could create a view in Athena which does all your cross join unnesting to the shape you'd like, then reference that It looks like you're correctly deserialising in that initial statement.

Or if the query is intense, it may be beneficial to do a transform in glue as recommended in this stackoverflow question.

TLDR:

  • if you'd like to cross join on stringified json, use presto's json_extract function
  • If Quicksight is having issues accessing arrays, try creating a view of your normalised / cross joined unnested data
  • For optimisation purposes, you can create a transform in glue to materialise the view
SenatorWaffles
  • 311
  • 2
  • 7