0

I'm trying to use Firehose to ingest some data. Here are the parameters:

  • Json data with schema in the Glue Schema Registry
  • Want to convert json -> parquet

From this post, it seems like Firehose cannot read a tables schema if said table is created from an existing schema. Can anyone confirm this? Like in that post I also get the error message:

The schema is invalid. The specified table has no columns.

My other options are to use a crawler or create the table manually. I'd like to name the table myself so I want to go with the latter.

Is there a way to have Firehose update the schema of a manually created table in Glue or is the Crawler my only option?

I could also just do the parquet conversion myself in a glue job but I'd rather Firehose do it if possible.

Sogun
  • 301
  • 4
  • 10

1 Answers1

0

This is possible but you need to use a bit of a hacky workaround in creating a second table using the columns from the first table that was created using a glue schema. You can then use that second table in your firehose config for conversion to whatever data type you want:

resource "aws_glue_catalog_table" "table_from_schema" {
  name          = "first_table"
  database_name = "foo"
  storage_descriptor {
    schema_reference {
      schema_id {
        schema_arn = aws_glue_schema.foo_schema.arn
      }
      schema_version_number = aws_glue_schema.foo_schema.latest_schema_version
    }
  }
}

resource "aws_glue_catalog_table" "table_from_first_table_that_can_be_used_with_firehose" {
  name          = "second_table"
  database_name = "foo"
  storage_descriptor {
    dynamic "columns" {
      for_each = aws_glue_catalog_table.table_from_schema.storage_descriptor[0].columns
      content {
        name = columns.value.name
        type = columns.value.type
      }
    }
  }
}
randal25
  • 1,290
  • 13
  • 10