Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.
Questions tagged [iceberg]
134 questions
0
votes
0 answers
How to reduce startup latency with Apache Iceberg on AWS EMR Serverless?
I am using Apache Iceberg on Apache ERM Serverless backed by AWS Glue Data Catalog. Following information found on this page, I am using pre-initialized workers, which should allow EMR to "maintain a warm pool of workers for the application so that…

Ismael Ghalimi
- 3,515
- 2
- 22
- 25
0
votes
0 answers
Inconsistent Behaviors for Multiple SparkSessions when accessing the Iceberg Table
I explored the multiple SparkSessions (to connect to different data sources/data clusters) a bit. And I found a wired behavior.
Firstly I created a SparkSession to RW the iceberg table, and everything works.
Then if I use the new SparkSession (with…

batilei
- 401
- 3
- 6
- 16
0
votes
0 answers
Unable to create iceberg table using Trino
I am trying to create iceberg table using trino but ddl is not executing stating that sort_order not find .below sample ddl
Create table .test ( C1 datetime,
C2 double ,
C3 double
)
With ( format='parquet',
Location='S3a://…

vikram bhati
- 33
- 4
0
votes
0 answers
How to update Iceberg table storing time series data
I'm trying to apply some updates to an Iceberg table using pyspark. The original data in the table is:
+-------------------+---+---+
| time| A| B|
+-------------------+---+---+
|2022-12-01 00:00:00| 1| 6|
|2022-12-02 00:00:00| 2| …

Jack
- 149
- 1
- 5
0
votes
2 answers
Amazon Athena - Error Create Iceberg table
I used this as a reference to create a Create statement that creates an Apache Iceberg table in Amazon Athena's Query Editor. Below.
CREATE TABLE iceberg_table (id int, data string, category string)
PARTITIONED BY (category, bucket(16, id))
LOCATION…

Kensuke Sato
- 183
- 1
- 13
0
votes
0 answers
Flink SQL not rolling iceberg files to hdfs while flink sql streaming job running
I am working in project using flink and iceberg to write data from kafka to iceberg hive table or hdfs using hadoop catalog when i publish message to kafka i can see message in kafka table but there is no file added in hdfs or row added in hive…
0
votes
2 answers
Spark ignoring package jars included in the configuration of my Spark Session
I keep running into a java.lang.ClassNotFoundException: Failed to find data source: iceberg. Please find packages at https://spark.apache.org/third-party-projects.html error.
I am trying to include the…

cheezit97
- 155
- 2
- 10
0
votes
0 answers
Setting up Spark and Icerberg on Jupyter Notebooks
I need help to setup spark and iceberg on Jupyter Notebooks, I followed this tutorial on my local but I can't replicate it on J Notebooks, please send any tutorials or walkarounds that you may know.
Thanks

user3476582
- 75
- 1
- 10
0
votes
0 answers
Any way to disable hive-style partitioning?
I am writing an apache iceberg table that is synced to a metastore. When generating tables, the partitioning appears as hive-style when I'd prefer it to be just the singular value. I also have tested hudi tables, which comes with an inherent way to…
0
votes
1 answer
Apache Iceberg tables not working with AWS Glue in AWS EMR
I'm trying to load a table in na spark EMR cluster from glue catalog in apache iceberg format that is stored in S3. The table is correctly created because I can query it from AWS Athena. On the cluster creation I have set this…

Shadowtrooper
- 1,372
- 15
- 28
0
votes
0 answers
How to create an Iceberg table built from multiple S3 buckets?
In my organization, it is a common practice to not hold more than 10-50k objects in one S3 bucket. But in Iceberg I've only seen an option of configuring the S3 bucket location of the data at the table level and not at the data files level.
I wonder…

apache-northeast
- 11
- 1
0
votes
0 answers
Iceberg table creation fails in AWS Glue
The table create statement from Presto to AWS Glue metastore fails with error "Lock is not supported by default".
presto> create schema isl_progress with(location='s3://xxxxxx/');
CREATE SCHEMA
presto> create table…

Binoy Thomas
- 299
- 3
- 8
0
votes
1 answer
pyspark configuration for connecting google cloud platform
i can't connect to my google cloud platform via pyspark, can anyone help?
i am not using dataproc, just a local spark instance
background:
i have downloaded all the jars file into $SPARK_HOME/jars,…

KIN SHING WONG
- 1
- 1
0
votes
0 answers
Data Compression using Trino Iceberg connector
I have a situation that I am investigating, I am looking to compress parquet files that are sitting on AWS S3, the typical sizes for those files are 90-100 MB, the reason for that is to reduce I/O operations. I am looking to use iceberg connector…

user3476582
- 75
- 1
- 10
0
votes
1 answer
Is it possible to query the diff between two Apache Iceberg snapshots?
I have two snapshots in my Iceberg history table, and I want to be able to see the difference between them, or at least with columns/ rows that have been affected on the last snapshot. Is there an easy way of getting this information?

ukrwine10
- 55
- 1
- 6