Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.
Questions tagged [iceberg]
134 questions
0
votes
0 answers
ICEBERG - MERGE INTO doesn't work in Glue Job 4.0 from docker image aws-glue-libs:glue_libs_4.0.0_image_01
I have an issue with Glue Job running from docker image amazon/aws-glue-libs:glue_libs_4.0.0_image_01 while "MERGE INTO" ICEBERG table.
I follow the instruction from…

user777
- 1
0
votes
0 answers
Cannot add data files to target table because that table is partitioned and contains non-identity partition transforms which will not be compatible
I am integrating iceberg with spark , I tried to create the table test partitioned by hour(EndTime) ,
create table local.db.test (
MSISDN string,
START_TIME timestamp,
END_TIME timestamp
)
USING iceberg PARTITIONED BY (hours(END_TIME));
then I…

Elsayed
- 2,712
- 7
- 28
- 41
0
votes
0 answers
Apache Iceberg: insert into/merge into/insert overwrite VS MOR/COW
i am learning iceberg currently. I can understand MOR and COW.
In MOR, delete files are created to track updates/deletes. In COW, old data files are copied into new data files and deletes/updates are written into the new data files.
but I have some…

BAE
- 8,550
- 22
- 88
- 171
0
votes
0 answers
Iceberg - how to avoid full table scan with bigint partition key
I have a Product and Order table with these schemas:
Product: (
id: bigint,
created_date: timestamp
)
USING iceberg
PARTITION BY (id)
Order: (
order_id: bigint,
product_id: bigint,
ts: timestamp
)
USING iceberg
PARTITION BY day(ts)
when I do Order…

huwng
- 61
- 2
0
votes
0 answers
Conflicting delete files error when running concurrent updates on an Iceberg table
When running 2 concurrent updates on the same partition of an Iceberg table using Spark, I get the following error: Found new conflicting delete files that can apply to records matching .... The updates are on two different entries in the partition…

CS1999
- 23
- 5
0
votes
0 answers
Iceberg Flink catalog factory cannot find org/apache/hadoop/conf/Configuration
I have a basic application that tries to write some data into an Iceberg table using Flink:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tenv =…

Alexey
- 1,354
- 13
- 30
0
votes
0 answers
file encryption to iceberg table format using Iceberg encryption module
Can we do:
csv file(input) encryption to iceberg table(output) format using AWS glue job using iceberg encryption package?
Is this possible to achieve this using Pyspark code?

Raj N.
- 13
- 4
0
votes
0 answers
what is the benefit of using delta lake or iceberg table format?
We currently store data on S3 using parquet format, and use AWS Glue data catalog to store table metadata. We add partitions by dates or hours. Most of queries that we have are read-only queries. I am wondering the benefits that we can get from…

yuyang
- 1,511
- 2
- 15
- 40
0
votes
0 answers
AWS Glue data catalog iceberg commit error
I am using a merge statement to update data in an AWS Glue Data Catalog table which has been created as an ICEBERG table to allow for these updates. When the select in my merge statement has no results returned, i see that i get an iceberg commit…

PPSATO
- 95
- 7
0
votes
0 answers
How to write many large parquet files to iceberg quickly using spark?
I'm new in iceberg and spark. I created an iceberg table and want to write my previous data to this iceberg table.
These data are many large parquet files. (500G per day, parquet file has 100 fields).
When I write these files to iceberg, it's very…

Xiao-Long Li
- 1
- 1
0
votes
0 answers
Cannot read iceberg table when upgrade glue 3.0 to 4.0 and iceberg 0.13.0 to 1.2.1
I used AWS Glue 3.0 and Iceberg Connector 0.13.0 to perform ETL. And it working well.
But after upgrading version AWS Glue to 4.0 and Iceberg Connector to 1.2.1 then the error has occurred
I can not read iceberg tables after the upgrade version.
I…

namtvd
- 1
- 1
0
votes
0 answers
why The Orc file of iceberg larger than Orc file of hive
when I use spark 3.3.2,the code is to write to the same table into hive partitioned table(table A) and iceberg(metadata store in hive) partitioned table(table B), both table are orc format and have same compression strategy.
I am doing following…

geng qing
- 1
- 1
0
votes
0 answers
Difference between ways to specify target path in Spark Structured Streaming?
In Spark Structured Streaming the target path of streaming write operations can be specified either by adding an .option('path', ) or as argument to the .start() method. The latter seems to be preferred with Delta Lake, the…

Kai Roesner
- 429
- 3
- 17
0
votes
1 answer
Is there standard way to get the data lake format from parquet file? (e.g. Apache iceberg, Apache Hudi, Deltalake)
I am writing parquet clean job using PyArrow.
However, I only want to process native parquet files and skip over any .parquet files in iceberg, hudi, or deltalake format.
This is because these formats require updates to be done through the…

Sam.E
- 175
- 2
- 10
0
votes
0 answers
How to read data of a particular snapshot in Apache Iceberg?
I am doing a POC on Apache Iceberg. While writing I am writing twice using 2 datafiles that create 2 snapshots. In my example, each file contains 4 rows. The last file is marked as current snapshot which is right. While reading the data, I want to…

Somesh Dhal
- 336
- 2
- 15