1

I'd like to use the Apache Iceberg Apache Spark-Java based API for rewriting data files on my Iceberg table. I'm writing my data files in an Avro format, but I'd like to rewrite them to Parquet. Is it possible in a somewhat easy way?

I've researched the API of https://iceberg.apache.org/javadoc/1.0.0/org/apache/iceberg/actions/RewriteDataFiles.html, with the builder of:

SparkActions
    .get()
    .rewriteDataFiles(table)
    .filter(Expressions.equal("date", "2020-08-18"))
    .option("target-file-size-bytes", Long.toString(500 * 1024 * 1024)) // 500 MB
    .execute();

But I could not find how can I change the files format.

1 Answers1

0
  1. Reset your table property
ALTER TABLE prod.db.sample SET TBLPROPERTIES (
    'write.format.default'='parquet'
)
  1. rewrite your data spark sql
CALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy => 'sort',sort_order => 'id',options => map('rewrite-all','true'))

or use spark-java api, as you listed above.

liliwei
  • 294
  • 1
  • 8