Highest Voted 'iceberg' Questions

0

votes

0 answers

ICEBERG - MERGE INTO doesn't work in Glue Job 4.0 from docker image aws-glue-libs:glue_libs_4.0.0_image_01

I have an issue with Glue Job running from docker image amazon/aws-glue-libs:glue_libs_4.0.0_image_01 while "MERGE INTO" ICEBERG table. I follow the instruction from…

database aws-glue iceberg

asked Jul 17 '23 at 09:31

user777

1

0

votes

0 answers

Cannot add data files to target table because that table is partitioned and contains non-identity partition transforms which will not be compatible

I am integrating iceberg with spark , I tried to create the table test partitioned by hour(EndTime) , create table local.db.test ( MSISDN string, START_TIME timestamp, END_TIME timestamp ) USING iceberg PARTITIONED BY (hours(END_TIME)); then I…

apache-spark apache-spark-sql iceberg apache-iceberg

asked Jul 15 '23 at 08:35

Elsayed

2,712
7
28
41

0

votes

0 answers

Apache Iceberg: insert into/merge into/insert overwrite VS MOR/COW

i am learning iceberg currently. I can understand MOR and COW. In MOR, delete files are created to track updates/deletes. In COW, old data files are copied into new data files and deletes/updates are written into the new data files. but I have some…

iceberg apache-iceberg

asked Jul 12 '23 at 19:04

BAE

8,550
22
88
171

0

votes

0 answers

Iceberg - how to avoid full table scan with bigint partition key

I have a Product and Order table with these schemas: Product: ( id: bigint, created_date: timestamp ) USING iceberg PARTITION BY (id) Order: ( order_id: bigint, product_id: bigint, ts: timestamp ) USING iceberg PARTITION BY day(ts) when I do Order…

apache-spark iceberg apache-iceberg

asked Jul 11 '23 at 11:09

huwng

61
2

0

votes

0 answers

Conflicting delete files error when running concurrent updates on an Iceberg table

When running 2 concurrent updates on the same partition of an Iceberg table using Spark, I get the following error: Found new conflicting delete files that can apply to records matching .... The updates are on two different entries in the partition…

apache-spark iceberg apache-iceberg

asked Jul 10 '23 at 21:00

CS1999

23
5

0

votes

0 answers

Iceberg Flink catalog factory cannot find org/apache/hadoop/conf/Configuration

I have a basic application that tries to write some data into an Iceberg table using Flink: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tenv =…

java hadoop apache-flink classpath iceberg

asked Jul 06 '23 at 14:23

Alexey

1,354
13
30

0

votes

0 answers

file encryption to iceberg table format using Iceberg encryption module

Can we do: csv file(input) encryption to iceberg table(output) format using AWS glue job using iceberg encryption package? Is this possible to achieve this using Pyspark code?

pyspark encryption aws-glue iceberg

asked Jul 03 '23 at 18:27

Raj N.

13
4

0

votes

0 answers

what is the benefit of using delta lake or iceberg table format?

We currently store data on S3 using parquet format, and use AWS Glue data catalog to store table metadata. We add partitions by dates or hours. Most of queries that we have are read-only queries. I am wondering the benefits that we can get from…

delta-lake iceberg apache-iceberg data-lakehouse

asked Jul 02 '23 at 06:09

yuyang

1,511
2
15
40

0

votes

0 answers

AWS Glue data catalog iceberg commit error

I am using a merge statement to update data in an AWS Glue Data Catalog table which has been created as an ICEBERG table to allow for these updates. When the select in my merge statement has no results returned, i see that i get an iceberg commit…

amazon-web-services merge aws-glue iceberg

asked Jun 29 '23 at 02:23

PPSATO

95
7

0

votes

0 answers

How to write many large parquet files to iceberg quickly using spark?

I'm new in iceberg and spark. I created an iceberg table and want to write my previous data to this iceberg table. These data are many large parquet files. (500G per day, parquet file has 100 fields). When I write these files to iceberg, it's very…

java scala apache-spark parquet iceberg

asked Jun 18 '23 at 03:28

Xiao-Long Li

1
1

0

votes

0 answers

Cannot read iceberg table when upgrade glue 3.0 to 4.0 and iceberg 0.13.0 to 1.2.1

I used AWS Glue 3.0 and Iceberg Connector 0.13.0 to perform ETL. And it working well. But after upgrading version AWS Glue to 4.0 and Iceberg Connector to 1.2.1 then the error has occurred I can not read iceberg tables after the upgrade version. I…

aws-glue iceberg

asked Jun 12 '23 at 08:40

namtvd

1
1

0

votes

0 answers

why The Orc file of iceberg larger than Orc file of hive

when I use spark 3.3.2，the code is to write to the same table into hive partitioned table(table A) and iceberg(metadata store in hive) partitioned table(table B), both table are orc format and have same compression strategy. I am doing following…

apache-spark-sql hive compression orc iceberg

asked Jun 08 '23 at 11:07

geng qing

1
1

0

votes

0 answers

Difference between ways to specify target path in Spark Structured Streaming?

In Spark Structured Streaming the target path of streaming write operations can be specified either by adding an .option('path', ) or as argument to the .start() method. The latter seems to be preferred with Delta Lake, the…

apache-spark delta-lake iceberg

asked May 17 '23 at 07:52

Kai Roesner

429
3
17

0

votes

1 answer

Is there standard way to get the data lake format from parquet file? (e.g. Apache iceberg, Apache Hudi, Deltalake)

I am writing parquet clean job using PyArrow. However, I only want to process native parquet files and skip over any .parquet files in iceberg, hudi, or deltalake format. This is because these formats require updates to be done through the…

parquet delta-lake apache-hudi iceberg

asked May 14 '23 at 17:29

Sam.E

175
2
10

0

votes

0 answers

How to read data of a particular snapshot in Apache Iceberg?

I am doing a POC on Apache Iceberg. While writing I am writing twice using 2 datafiles that create 2 snapshots. In my example, each file contains 4 rows. The last file is marked as current snapshot which is right. While reading the data, I want to…

java data-lake iceberg apache-iceberg

asked May 11 '23 at 08:16

Somesh Dhal

336
2
15

Questions tagged [iceberg]