3

Today, I saw myself with a simple problem, renaming column of an Athena glue table from old to new name.

First thing, I search here and tried some solutions like this, this, and many others. Unfortunately, none works, so I decided to use my knowledge and imagination.

I'm posting this question with the intention of share, but also, with the intention to get how others did and maybe find out I reinvented the wheel. So please also share your way if you know how to do it.

My setup is, a Athena JSON table partitioned by day with valuable and enormous amount of data, the infrastructure is defined and updated through Cloudformation.

How to rename an Athena column and still keep the data?

Roelant
  • 4,508
  • 1
  • 32
  • 62
Nicollas Braga
  • 802
  • 7
  • 27

1 Answers1

3

Explaining without all the cloudformation infrastructure.

Imagine a table containing:

  • userId
  • score
  • otherColumns
  • eventDateUtc
  • dt_utc

Partitioned by dt_utc and stored using JSON format. Wee need to change the column score to deltaScore.

Keep in mind, although I haven't tested with others format/configurations, this should apply to any configuration supported by athena as we are going to use athena algorithm to do the job for us.

How to do

if you run the cloudformation migration first, you gonna "lose" access to the dropped column.
but you can simply rename the column back and the data appears.

Those are the steps required for rename a AWS Athena table:

  1. Create a temporary table mapping the old column name to the new one:
    This can be done by use of CREATE TABLE AS, read more in the aws docs
    With this command, we use Athena engine to apply the transformation on the files of the original table for us and save at s3://bucket_name/A_folder/temp_table_rename/.
CREATE TABLE "temp_table_rename"
WITH(
  format = 'JSON',
  external_location = 's3://bucket_name/A_folder/temp_table_rename/',
  partitioned_by = ARRAY['dt_utc']
)
AS
 SELECT DISTINCT
   userid,
   score as deltascore,
   otherColumns,
   eventDateUtc,
   "dt_utc"
   FROM "my_database"."original_table"
  1. Apply the database rename by running the cloudformation with the changes or on the way you have.
    At this point, you can even drop the original_table, and create again using the right column name.
    After rename, you will notice that the renamed column have no data.

  2. Remove the data of the original table by deleting it's s3 source.

  3. Copy the data from the temp table source to the original table source
    I prefer to use a aws command as, there can be thousands of files to copy

aws s3 cp s3://bucket_name/A_folder/temp_table_rename/ s3://bucket_name/A_folder/original_table/ --recursive

  1. Restore the index of the original table MSCK REPAIR TABLE "my_database"."original_table"

done.

Final notes:

Using CREATE TABLE AS to do the transformation job, allow you to do much more than only renaming the column, for example split the data of a column into 2 new columns, or merge it to a single one.

Roelant
  • 4,508
  • 1
  • 32
  • 62
Nicollas Braga
  • 802
  • 7
  • 27