I am using cloudformation to create a "Virtual_View" of my table. When I use the same view to query data in AWS Athena Console it works fine and returns the data but When I try to use the same view as a dataset in AWS QuickSight(using SPICE) it throws following error:
"Unable to prepare this table.
Please try again or choose another table."
If I choose to run it using "Query" in Quicksight I get following error:
region: us-east-1
timestamp: 1558718487000
requestId: 58e18321-7e48-11e9-9740-618021a5eae5
sourceErrorCode: 0
sourceErrorMessage: [Simba][JDBC](11380) Null pointer exception.
sourceErrorState: HY000
sourceException: java.sql.SQLException
sourceType: ATHENA
Funny part is If I modify my view by using "show/edit query" option in Athena Web Interface and run the "Alter" view command against my view without changing anything .. it starts working fine on quick sight. this leads me to believe that the creation of View using my cloud formation is missing something or perhaps some thing else? Here is my cloudformation template that I am using to create db + table + view.
AWSTemplateFormatVersion: 2010-09-09
Description: Glue Athena database and table configuration
Parameters:
Stage:
Description: Stage name (dev, prod)
Type: String
MinLength: 3
PartitionKey:
Description: Patition key for the table (dont use dashes)
Type: String
Default: "modkey"
MinLength: 3
Resources:
GlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: !Sub
- db_${Stage}_glue
-
Stage: !Ref Stage
CatalogId: !Ref AWS::AccountId
GlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref GlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: tbl_request
TableType: EXTERNAL_TABLE
Parameters:
CrawlerSchemaDeserializerVersion: "1.0"
CrawlerSchemaSerializerVersion: "1.0"
classification: json
compressionType: none
typeOfData: file
PartitionKeys:
# Data is partitioned by this key
- Name: !Ref PartitionKey
Type: string
StorageDescriptor:
Compressed: false
Location:
Fn::Join:
- ''
- - 's3://'
- Fn::ImportValue:
!Sub
- requests-${Stage}-s3
-
Stage: !Ref Stage
- '/'
InputFormat: org.apache.hadoop.mapred.TextInputFormat
StoredAsSubDirectories: false
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
SerdeInfo:
Parameters: {paths: 'Id,Module,Organization,Redirect,RequestTime,Suppressed,Template,TemplateData,ToAddresses,ToAddress,Events'}
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
Columns:
- {Name: id, Type: string}
- {Name: organization, Type: string}
- {Name: module, Type: string}
- {Name: requesttime, Type: string}
- {Name: templatedata, Type: string}
- {Name: template, Type: string}
- {Name: toaddress, Type: string}
- {Name: toaddresses, Type: array<string>}
- {Name: suppressed, Type: array<string>}
- {Name: events, Type: array<string>}
- {Name: redirect, Type: array<string>}
ViewDeliverySample:
Type: AWS::Glue::Table
DependsOn: GlueTable
Properties:
DatabaseName: !Ref GlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: tbl_request_view
TableType: VIRTUAL_VIEW
ViewOriginalText:
Fn::Join:
- ''
- - '/* Presto View: '
- Fn::Base64: !Sub
- |
{
"originalSql": "WITH dataset AS ( WITH requests_dataset AS (SELECT * FROM ${TableName} ), basedataset AS (SELECT id, module, ${PartitionKey}, CAST( json_extract(event, '$.eventtype') AS VARCHAR ) AS eventtype, event AS detail FROM requests_dataset CROSS JOIN unnest(events) AS t(event) ), send_dataset AS (SELECT email, module, ${PartitionKey}, eventtype, CAST(json_extract(detail, '$.mail.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.mail.destination') AS ARRAY(VARCHAR))) AS t(email) WHERE eventtype = 'Send' ), delivery_dataset AS (SELECT email, module, ${PartitionKey}, eventtype, CAST(json_extract(detail, '$.delivery.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.delivery.recipients') AS ARRAY(VARCHAR))) AS t(email) WHERE eventtype = 'Delivery' ), bounce_dataset AS (SELECT CAST(rr['emailaddress'] AS VARCHAR )as email, module,${PartitionKey}, eventtype, CAST(json_extract(detail,'$.bounce.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.bounce.bouncedrecipients') AS ARRAY(MAP(VARCHAR,JSON))) ) AS t(rr) WHERE eventtype='Bounce' ), suppress_dataset AS (SELECT email, module, ${PartitionKey}, 'suppress' AS eventtype, requesttime AS time, id FROM requests_dataset CROSS JOIN unnest(suppressed) AS t(email) ) SELECT * FROM send_dataset UNION SELECT * FROM delivery_dataset UNION SELECT * FROM bounce_dataset UNION SELECT * FROM suppress_dataset ) SELECT * FROM dataset ORDER BY email, module, eventtype, time",
"catalog": "awsdatacatalog",
"schema": "${DatabaseName}",
"columns": [
{
"name": "email",
"type": "varchar"
},
{
"name": "module",
"type": "varchar"
},
{
"name": "modkey",
"type": "varchar"
},
{
"name": "eventtype",
"type": "varchar"
},
{
"name": "time",
"type": "varchar"
},
{
"name": "id",
"type": "varchar"
}
]
}
- {
DatabaseName: !Ref GlueDatabase,
TableName: !Ref GlueTable,
PartitionKey: !Ref PartitionKey
}
- ' */'
ViewExpandedText: '/* Presto View */'
Parameters:
presto_view: true
comment: "Presto View"
StorageDescriptor:
Compressed: false
StoredAsSubDirectories: false
SerdeInfo:
Parameters: {paths: 'email,module,modkey,eventtype,time,id'}
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
Columns:
- {Name: email, Type: string}
- {Name: module, Type: string}
- {Name: modkey, Type: string}
- {Name: eventtype, Type: string}
- {Name: time, Type: string}
- {Name: id, Type: string}