0

I defined a Glue:Table and a KinesisFirehose:DeliveryStream in a SAM template for AWS CloudFormation (see code below) and I got error files in the S3 output bucket, with the following error: "errorCode":"DataFormatConversion.InvalidSchema","errorMessage":"The schema is invalid. The specified table has no columns."

Inspecting the table, I see the columns. What may be the cause of the error?

The template

I omitted much of the detail and kept what I thought was more or less essential. As you can see, I defined a schema, a table and a delivery stream; in the omitted code, the database was defined as well.

# template.yaml #

Resources:
  ...

  myTable:
    Type: AWS::Glue::Table
    Properties: 
      ...
      CatalogId: !Ref AWS::AccountId
      DatabaseName: myDatabase
      TableInput:
        Name: myTable
        StorageDescriptor:
          SchemaReference: 
            SchemaVersionId: !GetAtt mySchema.InitialSchemaVersionId

  mySchema:
    Type: AWS::Glue::Schema
    Properties:
      ...
      CheckpointVersion: 
        IsLatest: true
        VersionNumber: 1
      Compatibility: BACKWARD
      DataFormat: AVRO
      Name: myTable
      SchemaDefinition: |
        {
          "type": "record",
          "namespace": "myDatabase",
          "name": "myTable",
          "fields": [
            {
              "name": "field1",
              "type": "string"
            },
            {
              "name": "field2",
              "type": "int"
            }
          ]
        }


  myDeliveryStream:
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      ExtendedS3DestinationConfiguration:
          ...
          SchemaConfiguration:
            Region: eu-central-1
            DatabaseName: myDatabase
            TableName: myTable
            CatalogId: !Ref AWS::AccountId
            VersionId: "0"
            RoleARN: !GetAtt FirehoseAccessRole.Arn
Gabriele
  • 420
  • 4
  • 15

1 Answers1

0

As turns out from this answer (which is related to Terraform templates), Kinesis Data Firehose (KDF) «is unable to read table's schema if table is created from existing schema».

The following schema leads to the resolution of the error. Note that the table version in the Firehose section has to be changed, increasing it to the number corresponding to the changes we now made (I had to look for it directly from the console, and there is no way to force a number or to retrieve the latest version within the template which I'm aware of); if the version is not changed, then one will get the same error as before, since the Firehose stream refers the old version.

# template.yaml #

Resources:
  ...

  myTable:
    Type: AWS::Glue::Table
    Properties: 
      ...
      CatalogId: !Ref AWS::AccountId
      DatabaseName: myDatabase
      TableInput:
        Name: myTable
        StorageDescriptor:
          Columns:
            - Name: daytime
              Type: string
            - Name: step_code
              Type: int
            - Name: id_spot
              Type: int
            - Name: id_phase
              Type: int
            - Name: duration
              Type: int
            - Name: n_sample
              Type: int

  myDeliveryStream:
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      ExtendedS3DestinationConfiguration:
          ...
          SchemaConfiguration:
            Region: eu-central-1
            DatabaseName: myDatabase
            TableName: myTable
            CatalogId: !Ref AWS::AccountId
            VersionId: "1"  # CHANGE THIS ACCORDINGLY!
            RoleARN: !GetAtt FirehoseAccessRole.Arn
Gabriele
  • 420
  • 4
  • 15