Change the location of a Hudi table in AWS?

Question

Describe the problem you faced How can we change the location of a hudi table to new location. I've customer table that is saved at s3://aws-amazon-com/Customer/ which I want to change to s3://aws-amazon-com/CustomerUpdated/ . I'm working on Glue 4

Using these jars: hudi-spark3-bundle_2.12-0.12.1.jar calcite-core-1.16.0.jar libfb303-0.9.3.jar

val partitionColumnName: String = "year"
val hudiTableName: String = "Customer"
val preCombineKey: String = "id"
val recordKey = "id"
val tablePath = "s3://aws-amazon-com/Customer/"
val databaseName="consumer_bureau"






val hudiCommonOptions: Map[String, String] = Map(
    "hoodie.table.name" -> hudiTableName,
    "hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.ComplexKeyGenerator",
    "hoodie.datasource.write.precombine.field" -> preCombineKey,
    "hoodie.datasource.write.recordkey.field" -> recordKey,
    "hoodie.datasource.write.operation" -> "bulk_insert",
    //"hoodie.datasource.write.operation" -> "upsert",
    "hoodie.datasource.write.row.writer.enable" -> "true",
    "hoodie.datasource.write.reconcile.schema" -> "true",
    "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
    "hoodie.datasource.write.hive_style_partitioning" -> "true",
    // "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
    //  "hoodie.upsert.shuffle.parallelism" -> "400",
    "hoodie.datasource.hive_sync.enable" -> "true",
    "hoodie.datasource.hive_sync.table" -> hudiTableName,
    "hoodie.datasource.hive_sync.database" -> databaseName,
    "hoodie.datasource.hive_sync.partition_fields" -> partitionColumnName,
    "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
    "hoodie.datasource.hive_sync.use_jdbc" -> "false",
    "hoodie.combine.before.upsert" -> "true",
    "hoodie.index.type" -> "BLOOM",
    "spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
    DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
  )
  
  
  val df=Seq((1,"Mark",1990),(2,"Martin",2009)).toDF("id","name","year")
  
  
     df.write.format("org.apache.hudi")
    .options(hudiCommonOptions)
    .mode(SaveMode.Append)
    .save(tablelocation)
    
    val tablelocationUpdated="s3://eec-aws-uk-ukidcibatchanalytics-prod-hudi-replication/consumer_bureau/production/CustomerUpdated/"
   


    df.write.format("org.apache.hudi") //writng to new location
    .options(hudiCommonOptions)
    .mode(SaveMode.Append)
    .save(tablelocationUpdated)

strong text

When I query Athena the table customer points to s3://aws-amazon-com/Customer/ not the updated location s3://aws-amazon-com/CustomerUpdated/ as expected . Is the table location change can be achieved using AWS glue or aws lambda.

Please help

score 0 · Answer 1 · answered Jun 09 '23 at 18:51

0

Yes you can change the hudi table location and you will also need to change the location path of the table in glue manually (for example thru the aws console or by using the was SDK). Hive sync won't update the location by itself.

answered Jun 09 '23 at 18:51

parisni

920
7
20

Could you please give me some example or can see the above code . – gaurav mathur Jun 12 '23 at 07:44

score 0 · Accepted Answer · edited Jun 17 '23 at 10:06

0

spark.sql(s"""alter table customer set location  's3://aws-amazon-com/CustomerUpdated/ '""")

Will change the table location of the Hudi table.

edited Jun 17 '23 at 10:06

Tyler2P

2,324
26
22
31

answered Jun 13 '23 at 14:21

gaurav mathur

35
5

Change the location of a Hudi table in AWS?

2 Answers2