Using AWS CLI, how do I update a Glue column's data type?

Question

I know I can easily use the AWS Glue console to do this, but I am just trying to do it through the AWS CLI instead. So I have an my_table_name table with an id column that is currently type string. However, I would like to change the type to bigint.

My current attempt at it is the code below. First, I get tableinput from get-table and change the 3rd column (id) to bigint. Then, I update the glue table with the modified tableinput as such:

#!/bin/bash
tableinput=$( aws glue get-table \
                        --database-name $databasename \
                        --name $tablename \
                        | json Table \
                        | json -e "this.StorageDescriptor.Columns[2].Type='bigint'" )
aws glue update-table \
    --database-name $databasename \
    --name $tablename \
    --table-input $tableinput

For reference, echo tableinput gets me this JSON:

{ "Name": "my_table_name", "DatabaseName": "my_database_name", "CreateTime": "my_date", "UpdateTime": "my_date", "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "kind", "Type": "string" }, { "Name": "etag", "Type": "string" }, { "Name": "id", "Type": "bigint" }, { "Name": "snippet_channelid", "Type": "string" }, { "Name": "snippet_title", "Type": "string" }, { "Name": "snippet_assignable", "Type": "boolean" } ], "Location": "my_location", "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", "Compressed": true, "NumberOfBuckets": -1, "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", "Parameters": { "serialization.format": "1" } }, "BucketColumns": [], "SortColumns": [], "Parameters": { "CrawlerSchemaDeserializerVersion": "1.0", "classification": "parquet", "compressionType": "snappy", "typeOfData": "file" }, "StoredAsSubDirectories": false }, "PartitionKeys": [], "TableType": "EXTERNAL_TABLE", "Parameters": { "classification": "parquet", "compressionType": "snappy", "projection.enabled": "false", "typeOfData": "file" }, "CreatedBy": "my_role", "IsRegisteredWithLakeFormation": false, "CatalogId": "my_catalog_id", "VersionId": "0" }

However, I am getting this error:

Unknown options: --name, "Name":, "my_table_name",, "DatabaseName":, "my_database_name",, "CreateTime":, "my_date",, "UpdateTime":, "my_date",, "Retention":, 0,, "StorageDescriptor":, {, "Columns":, [, {, "Name":, "kind",, "Type":, "string", },, {, "Name":, "etag",, "Type":, "string", },, {, "Name":, "id",, "Type":, "bigint", },, {, "Name":, "snippet_channelid",, "Type":, "string", },, {, "Name":, "snippet_title",, "Type":, "string", },, {, "Name":, "snippet_assignable",, "Type":, "boolean", }, ],, "Location":, "s3://my_location",, "InputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",, "OutputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",, "Compressed":, true,, "NumberOfBuckets":, -1,, "SerdeInfo":, {, "SerializationLibrary":, "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",, "Parameters":, {, "serialization.format":, "1", }, },, "BucketColumns":, [],, "SortColumns":, [],, "Parameters":, {, "CrawlerSchemaDeserializerVersion":, "1.0",, "classification":, "parquet",, "compressionType":, "snappy",, "typeOfData":, "file", },, "StoredAsSubDirectories":, false, },, "PartitionKeys":, [],, "TableType":, "EXTERNAL_TABLE",, "Parameters":, {, "classification":, "parquet",, "compressionType":, "snappy",, "projection.enabled":, "false",, "typeOfData":, "file", },, "CreatedBy":, "my_role",, "IsRegisteredWithLakeFormation":, false,, "CatalogId":, "my_catalog_id",, "VersionId":, "0", }, my_table_name

Removing the --name option from update-table gets me aws.exe: error: the following arguments are required: --name

Have you tried through the AWS console UI? – RoyalTiger Oct 13 '22 at 09:28 — RoyalTiger, Oct 13 '22 at 09:28

Using AWS CLI, how do I update a Glue column's data type?

0 Answers0