Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.
Questions tagged [iceberg]
134 questions
1
vote
0 answers
Connecting Iceberg's JdbcCatalog to Spark session
I have a JdbcCatalog initialized with H2 database in my local java code.
It is able to create iceberg tables with proper schema and partition spec.
When I create a spark session in the same class, it is unable to use the JdbcCatalog already created…

Ishan Das
- 11
- 3
1
vote
0 answers
How to insert comment in iceberg table?
everything good?
I'm trying to put a comment on the ICEBRG table in glue catalog , and I used it as follows:
spark.sql(f"""CREATE EXTERNAL TABLE IF NOT EXISTS {schema_name}.{table_name}({columns})
USING iceberg
COMMENT 'table…

Carlos Eduardo Bilar Rodrigues
- 303
- 1
- 15
1
vote
0 answers
Apache Iceberg bug MERGE INTO PySpark with UDF causes: `Cannot generate code for expression`
I have encountered a major bug with MERGE INTO in Spark when writing into Apache Iceberg format using a python UDF. The problem is that when the column that is used in the ON clause of MERGE INTO has been affected by a UDF, the merge throws an…

thijsvdp
- 404
- 3
- 16
1
vote
0 answers
Presto on iceberg: query failed
The same query succeed on spark sql, and the file also exists!
exitERROR SplitRunner-4-42
com.facebook.presto.execution.executor.TaskExecutor Error
processing Split…

zhenyu lee
- 11
- 1
1
vote
0 answers
Apache Iceberg Sort order id not being respected in Spark
Hi I have been seeing some unexpected behavior related to the sort ordering of a Iceberg table. The problem is that I set up SORT ORDER correctly such that the partitions are ordered. However, it seems from the data that it does not respect this…

thijsvdp
- 404
- 3
- 16
1
vote
0 answers
Iceberg with Hive Metastore does not create a catalog in Spark and uses default
I have been experiencing some (unexpected?) behavior where a catalog reference in Spark is not reflected in the Hive Metastore. I have followed the Spark configuration according to the documentation, which looks like it should create a new catalog…

thijsvdp
- 404
- 3
- 16
1
vote
0 answers
Why is it required to use a new Spark Session after writing a streaming dataframe into an Iceberg table to show new changes?
If you use a spark session to create an Iceberg table with Spark scala in batch mode, and after that you do a writestream process with a merge into operation it's not possible to see new changes with spark session used in batch process.
You need to…

Emilio
- 11
- 1
1
vote
0 answers
Running spark job from AWS Lambda
I would like to get data from IceBerg table using AWS Lambda. I was able to create all the code and containers only to discover that AWS Lambda doesn't allow process substitution that spark uses…

Pawel
- 93
- 2
- 7
1
vote
0 answers
how to copy a existing glue table to a iceberg format table with athena?
i have a a lot of json files in s3 which are updated frequently. Basically i am doing CRUD operations in a datalake. Because apache iceberg can handle item-level manipulations, i would like to migrate my data to use apache iceberg as table…

Khan
- 1,418
- 1
- 25
- 49
1
vote
0 answers
dynamic partition prunning not working in spark
There are two tables: one big (T0), one small (T1). I run code below and expect it to use DPP, but it does not:
df = spark.table('T0').select('A', 'B', 'C')
df1 = spark.table('T1').select('A')
df.join(F.broadcast(df1), ['A']).explain()
Then I do a…

Alex Loo
- 73
- 1
- 1
- 7
1
vote
1 answer
Does spark-sql query plan indicate which table partitions are used?
By looking at spark-sql plans, is there a way I can tell if a particular table (hive/iceberg) partition is being used or not?
For example, we have a table that has 3 partitions, let's say A=A_VAL, B=B_VAL, C=C_VAL. By looking at the plan is there a…

hba
- 7,406
- 10
- 63
- 105
1
vote
0 answers
Apache Iceberg on GCS atomic rename
I have a spark on dataproc serverless use case which requires to read/write with iceberg format on GCS.
Reading through documentation I realized that I cannot use hadoop table catalog because GCS does not support atomic rename:
A Hadoop catalog…

cventr
- 186
- 1
- 12
1
vote
1 answer
How to rewrite Apache Iceberg data files to another format?
I'd like to use the Apache Iceberg Apache Spark-Java based API for rewriting data files on my Iceberg table. I'm writing my data files in an Avro format, but I'd like to rewrite them to Parquet. Is it possible in a somewhat easy way?
I've researched…

apache-northeast
- 11
- 1
1
vote
1 answer
Iceberg table does not see the generated Parquet file
In my use case, the table in Iceberg format is created. It only receives APPEND operations as it is about recording events in a time series stream. To evaluate the use of the Iceberg format in this use-case, I created a simple Java program that…

João Paraná
- 1,031
- 1
- 9
- 18
1
vote
1 answer
Apache Iceberg Scheme Evolution using Spark
Currently I am using Iceberg in my project, so I am having one doubt in that.
My Current Scenario:
I have loaded the data into my Iceberg table using spark data frame(this is my doing through spark…

kunal nandwana
- 29
- 3