0

I'm new in iceberg and spark. I created an iceberg table and want to write my previous data to this iceberg table. These data are many large parquet files. (500G per day, parquet file has 100 fields).

When I write these files to iceberg, it's very slowly.

The code below:

                    spark.read().parquet(path)
                            .repartition(500)
                            .write().format("iceberg")
                            .mode(SaveMode.Append)
                            .option("mergeSchema", "true")
                            .saveAsTable(table);
                            .writeTo(table)
                            .append();

I found when I run the code above, whatever I change the spark conf, There's only one task write to iceberg, and the speed is very slowly.

--num-executors 15 \
--driver-memory 4g \
--executor-memory 16g \
--executor-cores 4 \
--conf spark.memory.fraction=0.6 \
--conf spark.sql.shuffle.partitions=500 \
--conf spark.shuffle.io.maxRetries=10 \
--conf spark.shuffle.io.retryWait=10s \

How can I let the write faster??

The version is: Spark: 3.2.0 iceberg: 1.2.1

0 Answers0