I am using AWS Athena to do some queries on AWS CloudTrail data object log entries.
The first few fields in a typical log entry look like this (pretty-printed for clarity):
{
"Records": [
{
"eventVersion": "1.08",
"userIdentity": {
"type": "AWSAccount",
"principalId": "",
"accountId": "ANONYMOUS_PRINCIPAL"
},
"eventTime": "2021-03-23T14:04:38Z",
"eventSource": "s3.amazonaws.com",
"eventName": "GetObject",
"awsRegion": "us-east-1",
"sourceIPAddress": "12.34.45.56",
"userAgent": "[Amazon CloudFront]",
"requestParameters": {
"bucketName": "mybucket",
"Host": "mybucket.s3.amazonaws.com",
"key": "bin/some/path/to/a/file"
},
"responseElements": null,
...
The AWS CloudTrail console will create a standard table to query these entries. The table is defined as this:
CREATE EXTERNAL TABLE `cloudtrail_logs_mybucket_logs`(
`eventversion` string COMMENT 'from deserializer',
`useridentity` struct<type:string,principalid:string,arn:string,accountid:string,invokedby:string,accesskeyid:string,username:string,sessioncontext:struct<attributes:struct<mfaauthenticated:string,creationdate:string>,sessionissuer:struct<type:string,principalid:string,arn:string,accountid:string,username:string>>> COMMENT 'from deserializer',
`eventtime` string COMMENT 'from deserializer',
`eventsource` string COMMENT 'from deserializer',
`eventname` string COMMENT 'from deserializer',
`awsregion` string COMMENT 'from deserializer',
`sourceipaddress` string COMMENT 'from deserializer',
`useragent` string COMMENT 'from deserializer',
`errorcode` string COMMENT 'from deserializer',
`errormessage` string COMMENT 'from deserializer',
`requestparameters` string COMMENT 'from deserializer',
`responseelements` string COMMENT 'from deserializer',
`additionaleventdata` string COMMENT 'from deserializer',
`requestid` string COMMENT 'from deserializer',
`eventid` string COMMENT 'from deserializer',
`resources` array<struct<arn:string,accountid:string,type:string>> COMMENT 'from deserializer',
`eventtype` string COMMENT 'from deserializer',
`apiversion` string COMMENT 'from deserializer',
`readonly` string COMMENT 'from deserializer',
`recipientaccountid` string COMMENT 'from deserializer',
`serviceeventdetails` string COMMENT 'from deserializer',
`sharedeventid` string COMMENT 'from deserializer',
`vpcendpointid` string COMMENT 'from deserializer')
COMMENT 'CloudTrail table for adafruit-circuit-python-logs bucket'
ROW FORMAT SERDE
'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT
'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://mybucket/AWSLogs/12345678901234/CloudTrail'
TBLPROPERTIES (
'classification'='cloudtrail',
'transient_lastDdlTime'='1616514617')
Note that useridentity
is described as a struct
, but requestParameters
is a string
. I would like to use the struct
feature to preparse requestParameters
, so I tried this:
CREATE EXTERNAL TABLE `cloudtrail_logs_mybucket_logs2`(
`eventversion` string COMMENT 'from deserializer',
`useridentity` struct<type:string,principalid:string,arn:string,accountid:string,invokedby:string,accesskeyid:string,username:string,sessioncontext:struct<attributes:struct<mfaauthenticated:string,creationdate:string>,sessionissuer:struct<type:string,principalid:string,arn:string,accountid:string,username:string>>> COMMENT 'from deserializer',
`eventtime` string COMMENT 'from deserializer',
`eventsource` string COMMENT 'from deserializer',
`eventname` string COMMENT 'from deserializer',
`awsregion` string COMMENT 'from deserializer',
`sourceipaddress` string COMMENT 'from deserializer',
`useragent` string COMMENT 'from deserializer',
`errorcode` string COMMENT 'from deserializer',
`errormessage` string COMMENT 'from deserializer',
`requestparameters` struct<`bucketName`:string, `Host`:string, `key`:string> COMMENT 'THIS IS NEW',
...[rest same as above]
The table is created, but trying to do a simple query using it ("Preview Table") gives this error:
GENERIC_INTERNAL_ERROR: parent builder is null
What's wrong with my attempt to use struct
on requestparameters
? It seems no different in terms of JSON as to what's going on with useridentity
.