Highest Voted 'iceberg' Questions

1

vote

0 answers

Connecting Iceberg's JdbcCatalog to Spark session

I have a JdbcCatalog initialized with H2 database in my local java code. It is able to create iceberg tables with proper schema and partition spec. When I create a spark session in the same class, it is unable to use the JdbcCatalog already created…

apache-spark iceberg apache-iceberg

asked Jun 04 '23 at 01:02

Ishan Das

11
3

1

vote

0 answers

How to insert comment in iceberg table?

everything good? I'm trying to put a comment on the ICEBRG table in glue catalog , and I used it as follows: spark.sql(f"""CREATE EXTERNAL TABLE IF NOT EXISTS {schema_name}.{table_name}({columns}) USING iceberg COMMENT 'table…

python pyspark aws-glue iceberg apache-iceberg

asked May 29 '23 at 13:39

Carlos Eduardo Bilar Rodrigues

303
1
15

1

vote

0 answers

Apache Iceberg bug MERGE INTO PySpark with UDF causes: `Cannot generate code for expression`

I have encountered a major bug with MERGE INTO in Spark when writing into Apache Iceberg format using a python UDF. The problem is that when the column that is used in the ON clause of MERGE INTO has been affected by a UDF, the merge throws an…

apache-spark pyspark user-defined-functions iceberg

asked May 19 '23 at 10:56

thijsvdp

404
3
16

1

vote

0 answers

Presto on iceberg: query failed

The same query succeed on spark sql, and the file also exists! exitERROR SplitRunner-4-42 com.facebook.presto.execution.executor.TaskExecutor Error processing Split…

hadoop presto iceberg

asked May 16 '23 at 03:56

zhenyu lee

11
1

1

vote

0 answers

Apache Iceberg Sort order id not being respected in Spark

Hi I have been seeing some unexpected behavior related to the sort ordering of a Iceberg table. The problem is that I set up SORT ORDER correctly such that the partitions are ordered. However, it seems from the data that it does not respect this…

apache-spark sorting partitioning hive-metastore iceberg

asked May 13 '23 at 13:18

thijsvdp

404
3
16

1

vote

0 answers

Iceberg with Hive Metastore does not create a catalog in Spark and uses default

I have been experiencing some (unexpected?) behavior where a catalog reference in Spark is not reflected in the Hive Metastore. I have followed the Spark configuration according to the documentation, which looks like it should create a new catalog…

apache-spark hive hive-metastore iceberg apache-iceberg

asked May 10 '23 at 09:53

thijsvdp

404
3
16

1

vote

0 answers

Why is it required to use a new Spark Session after writing a streaming dataframe into an Iceberg table to show new changes?

If you use a spark session to create an Iceberg table with Spark scala in batch mode, and after that you do a writestream process with a merge into operation it's not possible to see new changes with spark session used in batch process. You need to…

scala apache-spark spark-streaming iceberg

asked Feb 08 '23 at 08:55

Emilio

11
1

1

vote

0 answers

Running spark job from AWS Lambda

I would like to get data from IceBerg table using AWS Lambda. I was able to create all the code and containers only to discover that AWS Lambda doesn't allow process substitution that spark uses…

python amazon-web-services pyspark aws-lambda iceberg

asked Dec 27 '22 at 23:52

Pawel

93
2
7

1

vote

0 answers

how to copy a existing glue table to a iceberg format table with athena?

i have a a lot of json files in s3 which are updated frequently. Basically i am doing CRUD operations in a datalake. Because apache iceberg can handle item-level manipulations, i would like to migrate my data to use apache iceberg as table…

amazon-s3 amazon-athena iceberg aws-glue apache-iceberg

asked Dec 16 '22 at 22:43

Khan

1,418
1
25
49

1

vote

0 answers

dynamic partition prunning not working in spark

There are two tables: one big (T0), one small (T1). I run code below and expect it to use DPP, but it does not: df = spark.table('T0').select('A', 'B', 'C') df1 = spark.table('T1').select('A') df.join(F.broadcast(df1), ['A']).explain() Then I do a…

apache-spark pyspark apache-spark-sql iceberg

asked Dec 08 '22 at 10:13

Alex Loo

73
1
1
7

1

vote

1 answer

Does spark-sql query plan indicate which table partitions are used?

By looking at spark-sql plans, is there a way I can tell if a particular table (hive/iceberg) partition is being used or not? For example, we have a table that has 3 partitions, let's say A=A_VAL, B=B_VAL, C=C_VAL. By looking at the plan is there a…

apache-spark-sql hive iceberg

asked Nov 30 '22 at 15:11

hba

7,406
10
63
105

1

vote

0 answers

Apache Iceberg on GCS atomic rename

I have a spark on dataproc serverless use case which requires to read/write with iceberg format on GCS. Reading through documentation I realized that I cannot use hadoop table catalog because GCS does not support atomic rename: A Hadoop catalog…

google-cloud-platform atomic dataproc iceberg google-cloud-dataproc-metastore

asked Nov 19 '22 at 09:24

cventr

186
1
12

1

vote

1 answer

How to rewrite Apache Iceberg data files to another format?

I'd like to use the Apache Iceberg Apache Spark-Java based API for rewriting data files on my Iceberg table. I'm writing my data files in an Avro format, but I'd like to rewrite them to Parquet. Is it possible in a somewhat easy way? I've researched…

java apache-spark data-lake iceberg apache-iceberg

asked Nov 13 '22 at 23:22

apache-northeast

11
1

1

vote

1 answer

Iceberg table does not see the generated Parquet file

In my use case, the table in Iceberg format is created. It only receives APPEND operations as it is about recording events in a time series stream. To evaluate the use of the Iceberg format in this use-case, I created a simple Java program that…

java time-series iceberg

asked Oct 22 '22 at 18:23

João Paraná

1,031
1
9
18

1

vote

1 answer

Apache Iceberg Scheme Evolution using Spark

Currently I am using Iceberg in my project, so I am having one doubt in that. My Current Scenario: I have loaded the data into my Iceberg table using spark data frame(this is my doing through spark…

apache-spark iceberg apache-iceberg

asked Aug 16 '22 at 14:00

kunal nandwana

29
3

Questions tagged [iceberg]