Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.
Questions tagged [iceberg]
134 questions
2
votes
3 answers
How to actually delete files in Iceberg
I know that in Apache Iceberg I can set limits on number and age of snapshots, and that "deleting" data from the table does not result in underlying data removal, it simply masks or deletes tracking information.
I would like to actually delete the…

zachd1_618
- 4,210
- 6
- 34
- 47
2
votes
0 answers
Registering Iceberg Day Partition Transform UDFs in Spark
I am looking to apply Iceberg's same hidden day and year partitioning to a DataFrame in the same way as we apply the bucket partitioning. https://iceberg.apache.org/docs/latest/spark-writes/.
Iceberg provides IcebergSpark.registerbucketUdf; I'm…

zachd1_618
- 4,210
- 6
- 34
- 47
2
votes
2 answers
Getting error when querying iceberg table via Spark thrift server using beeline client?
I am trying to query iceberg table (External table with data in S3 & Metadata in Hivemetastore) using spark thrift server coming as part of Spark. I am able to query non iceberg tables but when I query iceberg table I am getting below error. Can we…

Bill Goldberg
- 1,699
- 5
- 26
- 50
2
votes
2 answers
what the difference between sparksessioncatalog and sparkcatalog in iceberg
As the title says.
question comes from:
I connect to spark-sql with iceberg catalog like this:
bin/spark-sql \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf…

ElapsedSoul
- 725
- 6
- 18
2
votes
2 answers
Apache Iceberg table format to ADLS / azure data lake
I am trying to find some integration to use iceberg table format on adls /azure data lake to perform crud operations. Is it possible to not use any other computation engine like spark to use it on azure. I think aws s3 supports this usecase. Any…

John
- 35
- 5
2
votes
1 answer
Writing multiple partition specs to Apache Iceberg table
I would like to write an Iceberg table with a different partition spec than the default table settings so that when I run data compaction the data would be compacted according to the default spec (as possible with the write-format config)
For…

Shimon Steinitz
- 21
- 3
2
votes
0 answers
Athena Iceberg Slow On Empty Table
I am looking at the new Iceberg Tables for AWS Athena. I'm hoping to move my data lake over to Iceberg so that I can significantly reduce the complexity of table partition management and hopefully get some better performance. I created a test…

micah
- 7,596
- 10
- 49
- 90
2
votes
1 answer
Unable to write data in table by Apache Iceberg using Spark
I am new to Apache Iceberg. I want to perform read and write operation using Apache Iceberg. I am using Spark 3.0.0.
code:
System.setProperty("hadoop.home.dir","C:\\hadoop" )
val conf = new SparkConf()
…

Santlal J. Gupta
- 91
- 8
2
votes
3 answers
How to execute a Spark SQL merge statement on an Iceberg table in Databricks?
I'm trying to get Apache Iceberg set up in our Databricks environment and running into an error when executing a MERGE statement in Spark SQL.
This code:
CREATE TABLE iceberg.db.table (id bigint, data string) USING iceberg;
INSERT INTO…

Aaron Kub
- 21
- 2
2
votes
0 answers
Flink's hive streaming vs iceberg/hudi/delta
There are some open sourced datake solutions that support crud/acid/incremental pull,such as Iceberg, Hudi, Delta. I think they have done what flink's hive streaming wants to do and even do better,
So, I would ask what the real power of flink's hive…

Tom
- 5,848
- 12
- 44
- 104
1
vote
1 answer
Write to Iceberg/Glue table from local PySpark session
I want to be able to operate (read/write) to an Iceberg table hosted on AWS Glue, from my local machine, using Python.
I have already:
Created an Iceberg table and registered it on AWS Glue
Populated the Iceberg table with limited data using…

Luiz Tauffer
- 463
- 6
- 17
1
vote
1 answer
Creating an Iceberg Table on S3 Using PyIceberg and Glue Catalog
I am attempting to create an Iceberg Table on S3 using the Glue Catalog and the PyIceberg library. My goal is to define a schema, partitioning specifications, and then create a table using PyIceberg. However, despite multiple attempts, I haven't…

Lew
- 11
- 4
1
vote
0 answers
Dataproc Spark Job sometimes gets java.lang.ClassNotFoundException for iceberg jar file
The dataproc cluster creation and spark job submission are scheduled every hour then the cluster will be deleted after the job completed. Sometimes the job is failed due to java.lang.ClassNotFoundException:…

suisen
- 53
- 9
1
vote
0 answers
Unable to install iceberg extensions for pyspark and use MERGE INTO
I have a python virtual environment in which I have added pyspark v3.4.1. I have run the following command to install the iceberg package-
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.3.0\
--conf…

Sharan Kumar
- 139
- 1
- 2
- 13
1
vote
0 answers
Spark job very slow at sort stage for iceberg insert operation with local sort
I am inserting data from one iceberg table to another iceberg table with local sort defined on destination table alter table schema1.test_iceberg_ordered1 WRITE DISTRIBUTED BY PARTITION LOCALLY ORDERED BY example_event_cd NULLS LAST while if I do…

Atif
- 2,011
- 9
- 23