3

I am using cloudformation to create a "Virtual_View" of my table. When I use the same view to query data in AWS Athena Console it works fine and returns the data but When I try to use the same view as a dataset in AWS QuickSight(using SPICE) it throws following error:

"Unable to prepare this table.
Please try again or choose another table."

If I choose to run it using "Query" in Quicksight I get following error:

region: us-east-1 
timestamp:  1558718487000
requestId:  58e18321-7e48-11e9-9740-618021a5eae5 
sourceErrorCode:    0
sourceErrorMessage: [Simba][JDBC](11380) Null pointer exception.
sourceErrorState:   HY000 
sourceException:    java.sql.SQLException
sourceType: ATHENA

Funny part is If I modify my view by using "show/edit query" option in Athena Web Interface and run the "Alter" view command against my view without changing anything .. it starts working fine on quick sight. this leads me to believe that the creation of View using my cloud formation is missing something or perhaps some thing else? Here is my cloudformation template that I am using to create db + table + view.

AWSTemplateFormatVersion: 2010-09-09
Description: Glue Athena database and table configuration

Parameters:
  Stage:
    Description: Stage name (dev, prod)
    Type: String
    MinLength: 3
  PartitionKey:
    Description: Patition key for the table (dont use dashes)
    Type: String
    Default: "modkey"
    MinLength: 3

Resources:
  GlueDatabase:
    Type: AWS::Glue::Database
    Properties:
      DatabaseInput:
        Name: !Sub
          - db_${Stage}_glue
          - 
            Stage: !Ref Stage
      CatalogId: !Ref AWS::AccountId

  GlueTable:
    Type: AWS::Glue::Table
    Properties:
      DatabaseName: !Ref GlueDatabase
      CatalogId: !Ref AWS::AccountId
      TableInput:
        Name: tbl_request
        TableType: EXTERNAL_TABLE
        Parameters:
          CrawlerSchemaDeserializerVersion: "1.0"
          CrawlerSchemaSerializerVersion: "1.0"
          classification: json
          compressionType: none
          typeOfData: file
        PartitionKeys:
        # Data is partitioned by this key
        - Name: !Ref PartitionKey
          Type: string
        StorageDescriptor:
          Compressed: false
          Location:
            Fn::Join:
              - ''
              - - 's3://'
                - Fn::ImportValue:
                    !Sub
                      - requests-${Stage}-s3
                      -
                        Stage: !Ref Stage
                - '/'
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          StoredAsSubDirectories: false
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          SerdeInfo:
            Parameters: {paths: 'Id,Module,Organization,Redirect,RequestTime,Suppressed,Template,TemplateData,ToAddresses,ToAddress,Events'}
            SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
          Columns:
          - {Name: id, Type: string}
          - {Name: organization, Type: string}
          - {Name: module, Type: string}
          - {Name: requesttime, Type: string}
          - {Name: templatedata, Type: string}
          - {Name: template, Type: string}
          - {Name: toaddress, Type: string}
          - {Name: toaddresses, Type: array<string>}
          - {Name: suppressed, Type: array<string>}
          - {Name: events, Type: array<string>}
          - {Name: redirect, Type: array<string>}

  ViewDeliverySample:
    Type: AWS::Glue::Table
    DependsOn: GlueTable
    Properties:
      DatabaseName: !Ref GlueDatabase
      CatalogId: !Ref AWS::AccountId
      TableInput:
        Name: tbl_request_view
        TableType: VIRTUAL_VIEW
        ViewOriginalText: 
          Fn::Join:
            - ''
            - - '/* Presto View: '
              - Fn::Base64: !Sub 
                  - |
                      {
                      "originalSql": "WITH dataset AS ( WITH requests_dataset AS (SELECT * FROM ${TableName} ), basedataset AS (SELECT id, module, ${PartitionKey}, CAST( json_extract(event, '$.eventtype') AS VARCHAR ) AS eventtype, event AS detail FROM requests_dataset CROSS JOIN unnest(events) AS t(event) ), send_dataset AS (SELECT email, module, ${PartitionKey}, eventtype, CAST(json_extract(detail, '$.mail.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.mail.destination') AS ARRAY(VARCHAR))) AS t(email) WHERE eventtype = 'Send' ), delivery_dataset AS (SELECT email, module, ${PartitionKey}, eventtype, CAST(json_extract(detail, '$.delivery.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.delivery.recipients') AS ARRAY(VARCHAR))) AS t(email) WHERE eventtype = 'Delivery' ), bounce_dataset AS (SELECT CAST(rr['emailaddress'] AS VARCHAR )as email, module,${PartitionKey}, eventtype, CAST(json_extract(detail,'$.bounce.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.bounce.bouncedrecipients') AS ARRAY(MAP(VARCHAR,JSON))) ) AS t(rr) WHERE eventtype='Bounce' ), suppress_dataset AS (SELECT email, module, ${PartitionKey}, 'suppress' AS eventtype, requesttime AS time, id FROM requests_dataset CROSS JOIN unnest(suppressed) AS t(email) ) SELECT * FROM send_dataset UNION SELECT * FROM delivery_dataset UNION SELECT * FROM bounce_dataset UNION SELECT * FROM suppress_dataset ) SELECT * FROM dataset ORDER BY email, module, eventtype, time",
                      "catalog": "awsdatacatalog",
                      "schema": "${DatabaseName}",
                      "columns": [
                        {
                          "name": "email",
                          "type": "varchar"
                        },
                        {
                          "name": "module",
                          "type": "varchar"
                        },
                        {
                          "name": "modkey",
                          "type": "varchar"
                        },
                        {
                          "name": "eventtype",
                          "type": "varchar"
                        },
                        {
                          "name": "time",
                          "type": "varchar"
                        },
                        {
                          "name": "id",
                          "type": "varchar"
                        }
                      ]
                      }
                  - { 
                      DatabaseName: !Ref GlueDatabase,
                      TableName: !Ref GlueTable,
                      PartitionKey: !Ref PartitionKey
                    }
              - ' */'
        ViewExpandedText: '/* Presto View */'
        Parameters:
          presto_view: true
          comment: "Presto View"
        StorageDescriptor:
          Compressed: false
          StoredAsSubDirectories: false
          SerdeInfo:
            Parameters: {paths: 'email,module,modkey,eventtype,time,id'}
            SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
          Columns:
          - {Name: email, Type: string}
          - {Name: module, Type: string}
          - {Name: modkey, Type: string}
          - {Name: eventtype, Type: string}
          - {Name: time, Type: string}
          - {Name: id, Type: string}
Sohail Ali
  • 81
  • 8

2 Answers2

3

able to fix by taking out serdeinfo and adding empty partitionkey array.

ViewDeliverySample:
    Description: some description here  # change this
    Type: AWS::Glue::Table
    DependsOn: GlueTable
    Properties:
      DatabaseName: !Ref GlueDatabase
      CatalogId: !Ref AWS::AccountId
      TableInput:
        Name: tbl_request_view
        TableType: VIRTUAL_VIEW
        Parameters:
          presto_view: true
        PartitionKeys: []
        ViewOriginalText: 
          Fn::Join:
            - ''
            - - '/* Presto View: '
              - Fn::Base64: !Sub 
                  - |
                      {
                      "originalSql": "my sql query here",
                      "catalog": "awsdatacatalog",
                      "schema": "${DatabaseName}",
                      "columns": [
                        {
                          "name": "email",
                          "type": "varchar"
                        },
                        {
                          "name": "module",
                          "type": "varchar"
                        },
                        {
                          "name": "${PartitionKey}",
                          "type": "varchar"
                        },
                        {
                          "name": "eventtype",
                          "type": "varchar"
                        },
                        {
                          "name": "time",
                          "type": "varchar"
                        },
                        {
                          "name": "id",
                          "type": "varchar"
                        }
                      ]
                      }
                  - { 
                      DatabaseName: !Ref GlueDatabase,
                      TableName: !Ref GlueTable,
                      PartitionKey: !Ref PartitionKey
                    }
              - ' */'
        ViewExpandedText: '/* Presto View */'
        StorageDescriptor:
          SerdeInfo: {}
          Columns:
          - {Name: email, Type: string}
          - {Name: module, Type: string}
          - {Name: eventtype, Type: string}
          - {Name: time, Type: string}
          - {Name: id, Type: string}
Sohail Ali
  • 81
  • 8
0

I had a similar issue, with a different resolution. Posting my answer for others who run into this since there isn't much on Google.

My full error in the Quicksight UI was:

region: eu-west-1
timestamp: 1678190654189
requestId: <redacted>
sourceErrorCode: 0
sourceErrorMessage: [Simba][JDBC](11380) Null pointer exception.
sourceErrorState: HY000
sourceException: java.sql.SQLException
sourceType: ATHENA

The root cause was two things:

  1. KMS-encrypted S3 bucket for Athena results.
  2. KMS-encrypted Glue Catalog.

To debug #1

I noticed in Athena's query history that the simple select 1 query would fail when it was issued from Quicksight, leading to the above error, but the query worked fine if I ran it manually on Athena. On the Athena history query I could see more detail for the failed query and this helped to identify that the root cause was KMS encrypted data.

To resolve this, I attached an additional custom policy to the quicksight role giving KMS permissions. (Note that if you change Security Settings in Quicksight after this it will refuse to apply any changes until you detach the custom policy. It can then be re-attached.)

To debug #2

I was still getting the same issue in Quicksight, to further debug I created a custom SQL query data source and used a simple SELECT * FROM mytable LIMIT 10 that I copied from the Athena query editor using table preview. This custom SQL data source still failed in Quicksight, but produced a useful error message:

region: eu-west-1
timestamp: 1678190771983
requestId: <redacted>
sourceErrorCode: 100071
sourceErrorMessage: [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access. (Service: AWSKMS; Status Code: 400; Error Code: AccessDeniedException; Request ID: <redacted>; Proxy: null) (Service: AWSGlue; Status Code: 400; Error Code: GlueEncryptionException; Request ID: <redacted>; Proxy: null) [Execution ID: <redacted>]
sourceErrorState: HY000
sourceException: java.sql.SQLException
sourceType: ATHENA

Note that the error comes from the Glue service, which indicated to me that an encrypted Glue Catalog was the issue. After adding the relevant kms permissions to the custom policy I had attached to the Quicksight role, the issue was finally resolved.

The kms actions I needed to add for both KMS keys were kms:Decrypt and kms:GenerateDataKey.

BjornO
  • 851
  • 8
  • 16