0

Describe the problem you faced How can we change the location of a hudi table to new location. I've customer table that is saved at s3://aws-amazon-com/Customer/ which I want to change to s3://aws-amazon-com/CustomerUpdated/ . I'm working on Glue 4

Using these jars: hudi-spark3-bundle_2.12-0.12.1.jar calcite-core-1.16.0.jar libfb303-0.9.3.jar

val partitionColumnName: String = "year"
val hudiTableName: String = "Customer"
val preCombineKey: String = "id"
val recordKey = "id"
val tablePath = "s3://aws-amazon-com/Customer/"
val databaseName="consumer_bureau"






val hudiCommonOptions: Map[String, String] = Map(
    "hoodie.table.name" -> hudiTableName,
    "hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.ComplexKeyGenerator",
    "hoodie.datasource.write.precombine.field" -> preCombineKey,
    "hoodie.datasource.write.recordkey.field" -> recordKey,
    "hoodie.datasource.write.operation" -> "bulk_insert",
    //"hoodie.datasource.write.operation" -> "upsert",
    "hoodie.datasource.write.row.writer.enable" -> "true",
    "hoodie.datasource.write.reconcile.schema" -> "true",
    "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
    "hoodie.datasource.write.hive_style_partitioning" -> "true",
    // "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
    //  "hoodie.upsert.shuffle.parallelism" -> "400",
    "hoodie.datasource.hive_sync.enable" -> "true",
    "hoodie.datasource.hive_sync.table" -> hudiTableName,
    "hoodie.datasource.hive_sync.database" -> databaseName,
    "hoodie.datasource.hive_sync.partition_fields" -> partitionColumnName,
    "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
    "hoodie.datasource.hive_sync.use_jdbc" -> "false",
    "hoodie.combine.before.upsert" -> "true",
    "hoodie.index.type" -> "BLOOM",
    "spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
    DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
  )
  
  
  val df=Seq((1,"Mark",1990),(2,"Martin",2009)).toDF("id","name","year")
  
  
     df.write.format("org.apache.hudi")
    .options(hudiCommonOptions)
    .mode(SaveMode.Append)
    .save(tablelocation)
    
    val tablelocationUpdated="s3://eec-aws-uk-ukidcibatchanalytics-prod-hudi-replication/consumer_bureau/production/CustomerUpdated/"
   


    df.write.format("org.apache.hudi") //writng to new location
    .options(hudiCommonOptions)
    .mode(SaveMode.Append)
    .save(tablelocationUpdated)

strong text

When I query Athena the table customer points to s3://aws-amazon-com/Customer/ not the updated location s3://aws-amazon-com/CustomerUpdated/ as expected . Is the table location change can be achieved using AWS glue or aws lambda.

Please help

2 Answers2

0

Yes you can change the hudi table location and you will also need to change the location path of the table in glue manually (for example thru the aws console or by using the was SDK). Hive sync won't update the location by itself.

parisni
  • 920
  • 7
  • 20
0
spark.sql(s"""alter table customer set location  's3://aws-amazon-com/CustomerUpdated/ '""")

Will change the table location of the Hudi table.

Tyler2P
  • 2,324
  • 26
  • 22
  • 31