AWS crawler has prefix property for adding new tables. So If I leave prefix empty and start crawler to s3://my-bucket/some-table-backup
it creates table with name some-table-backup
. Is there a way to rename it to my-awesome-table
and keep crawler updating renamed table? Or set up crawler to create new table with provided name?
Asked
Active
Viewed 1.0k times
13

Cherry
- 31,309
- 66
- 224
- 364
5 Answers
8
It's not possible to set up the crawler to do this, but it is very fast to create a new table that is the same as the table created by the crawler in every way, except the name. In Python:
import boto3
database_name = "database"
table_name = "prefix-dir_name"
new_table_name = "more_awesome_name"
client = boto3.client("glue")
response = client.get_table(DatabaseName=database_name, Name=table_name)
table_input = response["Table"]
table_input["Name"] = new_table_name
# Delete keys that cause create_table to fail
table_input.pop("CreatedBy")
table_input.pop("CreateTime")
table_input.pop("UpdateTime")
table_input.pop("DatabaseName")
table_input.pop("IsRegisteredWithLakeFormation")
catalog_id = table_input.pop("CatalogId")
client.create_table(
DatabaseName=database_name,
TableInput=table_input,
CatalogId=catalog_id
)
3
Encountered the same issue. Needed to drop more attributes than in Dan Hook's answer before the table could be queried in Redshift.
table_input="$(aws glue --region us-west-2 get-table --database-name database --name old_table --query 'Table' | jq '{Name: "new_table", StorageDescriptor, TableType, Parameters}')"
aws glue create-table --region us-west-2 --database-name database --table-input "$table_input"
aws glue delete-table --region us-west-2 --database-name database --name "old_table"

dbaumann
- 190
- 9
-
good approach, but this didn't work with partitioned table. – Miae Kim Apr 22 '21 at 23:10
2
Extension to Dan's solution but with partitioned table.
import boto3
database_name = "some_database"
table_name = "old_table_name"
new_table_name = "new_table_name"
client = boto3.client("glue", region_name='us-east-1')
response = client.get_table(DatabaseName=database_name, Name=table_name)
partitions = client.get_partitions(DatabaseName=database_name, TableName=table_name)["Partitions"]
table_input = response["Table"]
table_input["Name"] = new_table_name
# Delete keys that cause create_table to fail
table_input.pop("CreatedBy")
table_input.pop("CreateTime")
table_input.pop("UpdateTime")
table_input.pop("DatabaseName")
table_input.pop("IsRegisteredWithLakeFormation")
# Delete unnecessary keys in partitions
for partition in partitions:
partition.pop("DatabaseName")
partition.pop("TableName")
partition.pop("CreationTime")
# Create new table table
client.create_table(DatabaseName=database_name, TableInput=table_input)
# Create partitions
client.batch_create_partition(DatabaseName=database_name, TableName=new_table_name, PartitionInputList=partitions)

Myz
- 818
- 1
- 8
- 21
0
As dan mentioned, Crawlers can't rename the table. Either rename using a python script in the glue job or create an new external hive table in amazon-athena and point it to the location of the old table.

Kishore Bharathy
- 441
- 1
- 3
- 11
0
To further extend on Dan's answer, please make sure you remove VersionId
as well. See below for an update:
import boto3
database_name = "database"
table_name = "prefix-dir_name"
new_table_name = "more_awesome_name"
client = boto3.client("glue")
response = client.get_table(DatabaseName=database_name, Name=table_name)
table_input = response["Table"]
table_input["Name"] = new_table_name
# Delete keys that cause create_table to fail
table_input.pop("CreatedBy")
table_input.pop("CreateTime")
table_input.pop("UpdateTime")
table_input.pop("DatabaseName")
table_input.pop("IsRegisteredWithLakeFormation")
table_input.pop("VersionId")
catalog_id = table_input.pop("CatalogId")
client.create_table(
DatabaseName=database_name,
TableInput=table_input,
CatalogId=catalog_id
)

Sean Lindo
- 25
- 4