Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.
Questions tagged [iceberg]
134 questions
0
votes
2 answers
Version control of big data tables (iceberg)
I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column…

Wojtek
- 1
0
votes
1 answer
Partition bucket by year and month in PySpark
I have a DF like:
Cod Date
1 2022-01-01
1 2022-01-10
1 2022-02-01
2 2022-03-01
2 2022-04-01
I'm trying to use Apache Iceberg to partition my DF by Cod/Year/Month using hidding partitioning.
spark.sql("CREATE TABLE local.table…

OdiumPura
- 444
- 5
- 25
0
votes
0 answers
Apache iceberg: how to set write.metadata.previous-versions-max
Having many historical metadata files in apache iceberg helps us to produce a linear history of table versions and ensures that concurrent writes are not lost.
In Apache iceberg there is a table write property…

wbrycki
- 121
- 1
- 8
0
votes
0 answers
Unable to create Iceberg tables using Pyspark in Hive
I am trying to create Iceberg(0.11.1) formatted tables in Hive 3.1.1 using Pyspark 3.0.2 but getting below errors and warnings.
Any help will be greatly appreciated. Let me know if I need to add any more details.
Code to create table:
spark.sql("…

Atif
- 2,011
- 9
- 23
0
votes
1 answer
Apache Fink & Iceberg: Not able to process hundred of RowData types
I have a Flink application that reads arbitrary AVRO data, maps it to RowData and uses several FlinkSink instances to write data into ICEBERG tables. By arbitrary data I mean that I have 100 types of AVRO messages, all of them with a common…

nach0
- 379
- 1
- 3
- 14
0
votes
0 answers
How can I improve AWS Athena Iceberg read/write operations?
I have two identical tables; one created as the result of using a crawler on a .csv and the other an Iceberg table created with the following command:
CREATE TABLE dan_grafana.iced (
meter string,
readtime timestamp,
kwh_total…

Dan M
- 4,340
- 8
- 20
0
votes
1 answer
how to use replaceWhere option with Apache iceberg while writing data
I'm currently trying to write data using Iceberg to an external Hive table which is partitioned by partition_date column.
Before writing the data with Iceberg format, test table has 2 rows,
("2015-01-02", "S01233",…

Leroy Mikenzi
- 792
- 6
- 22
- 46
0
votes
0 answers
spark2 cbo on iceberg table
I am using Spark2.4 with an iceberg table. I want to enable CBO but I do not find a way to calc table stats.
The table is created using Iceberg Catalog API and data is populated using Spark DataFrame.
Is there a way to do that?

igreenfield
- 1,618
- 19
- 36
0
votes
1 answer
How to convert Dataset to List
Would like to know how to convert Dataset to List.
I'm speaking on:
org.apache.avro.generic.GenericRecord
org.apache.spark.sql.Dataset
org.apache.spark.sql.Row
Dataset data = spark.sql(SQL_QUERY)
The result is different…

Roni Koren Kurtberg
- 495
- 1
- 8
- 18
0
votes
0 answers
AWS Athena Iceberg Multiple Row UPDATE
I am trying to update an iceberg table on Athena joined with another table
My script:
UPDATE sampledb.icbg_tenure_20220905 a INNER JOIN sampledb.cit_update b ON a.msisdn_hash = b.msisdn_hash SET a.activation_date = b.activationdate
Getting error…

c0ng111
- 31
- 3
0
votes
1 answer
java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface
While launching the spark-shell with iceberg dependencies, we got the following error:
spark-shell \
--packages org.apache.iceberg:iceberg-spark3-runtime:0.13.0 \
--conf…

Ranga Reddy
- 2,936
- 4
- 29
- 41
0
votes
1 answer
write apache iceberg table to azure ADLS / S3 without using external catalog
I'm trying to create an iceberg table format on cloud object storage.
In the below image we can see that iceberg table format needs a catalog. This catalog stores current metadata pointer, which points to the latest metadata. The Iceberg quick start…

ns15
- 5,604
- 47
- 51
0
votes
1 answer
Unable to query Iceberg table from PySpark script in AWS Glue
I'm trying to read data from an iceberg table, the data is in ORC format and partitioned by column. I'm getting this error -
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to fetch table…

lightyagami96
- 336
- 1
- 4
- 14
0
votes
1 answer
"Iceberg query cannot be parsed" when trying to create Iceberg table with MAP column data type in Athena?
According to the Athena Iceberg documentation, the map type is supported.
Why do neither of these statements work?
CREATE TABLE iceberg_test1 (id string, themap map)
LOCATION 's3://mybucket/test/iceberg1'
TBLPROPERTIES ( 'table_type' = 'ICEBERG'…

Alex R
- 11,364
- 15
- 100
- 180
0
votes
1 answer
Error when changing partition field in Iceberg, from spark
we are writing to iceberg using spark, and when renaming the partition field name, we are getting a validation error:
org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: some_date: void(1)
It…

Itai Sevitt
- 140
- 1
- 7