Questions tagged [apache-iceberg]

Apache Iceberg is a high-performance table format to enable analytics purposes. It allows SQL tables to be consumed by analytics tools such as Apache Spark, Apache Flink, Apache Hive, Trino, PrestoDB, Impala, StarRocks, Doris, and Pig.

Apache Iceberg (often referred only as Iceberg) is a high-performance table format to enable analytics purposes. It allows SQL tables to be consumed by analytics tools such as Apache Spark, Apache Flink, Apache Hive, Trino, PrestoDB, Impala, StarRocks, Doris, and Pig.

68 questions
1
vote
0 answers

Can not read my glue catalog table from glue notebook with sparkdataframes

Hello I have built an apache iceberg database in s3 and added it to glue catalog so that I can query it from athena. Now I am trying to perform some ETL from glue notebooks but it keeps on returning the following error AnalysisExeption:…
1
vote
1 answer

Iceberg - MERGE INTO TABLE is not supported temporarily

I tried to merge data from parquet file into java.lang.UnsupportedOperationException: MERGE INTO TABLE is not supported temporarily. I use spark 3.3.0 with iceberg 1.1.0 running on a dataproc cluster which already attached to a dataproc metastore…
1
vote
0 answers

pip install iceberg - Preparing metadata (setup.py) error

Command: pip install iceberg Returns this error: C:\Users\abc>pip install iceberg Collecting iceberg Using cached iceberg-0.4.tar.gz (17 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py…
1
vote
0 answers

how to copy a existing glue table to a iceberg format table with athena?

i have a a lot of json files in s3 which are updated frequently. Basically i am doing CRUD operations in a datalake. Because apache iceberg can handle item-level manipulations, i would like to migrate my data to use apache iceberg as table…
Khan
  • 1,418
  • 1
  • 25
  • 49
1
vote
0 answers

Table data empty in athena

Hi I am struggling to create an table on aws athena, the table DDL is as follows: CREATE TABLE iceberg.matrix_b_blueprint_1 ( entrytime timestamp, key string, ingestion_time timestamp, field_0 int, field_1 string, field_2 boolean, field_3…
1
vote
1 answer

How to rewrite Apache Iceberg data files to another format?

I'd like to use the Apache Iceberg Apache Spark-Java based API for rewriting data files on my Iceberg table. I'm writing my data files in an Avro format, but I'd like to rewrite them to Parquet. Is it possible in a somewhat easy way? I've researched…
1
vote
1 answer

Apache Iceberg Scheme Evolution using Spark

Currently I am using Iceberg in my project, so I am having one doubt in that. My Current Scenario: I have loaded the data into my Iceberg table using spark data frame(this is my doing through spark…
0
votes
0 answers

Apache Iceberg ListType(StructType) columns not working in Spark SQL

I am trying to ADD a COLUMN to an existing Iceberg table using Spark SQL but I get an invalid SQL syntax when constructing the string SQL. The function that creates the Iceberg column is the following: def sparkTypeToType[A <: DataType](sparkType:…
Oscar Drai
  • 141
  • 1
  • 7
0
votes
1 answer

Not able to get Array size in Apache Iceberg with Spark 3.2.0 or before

From official doc: https://spark.apache.org/docs/latest/api/sql/index.html#array_size , it is present from Spark 3.3.0 but I need the same in Spark 3.2.0 Is there some alternative for array_size that I can use while writing SQL query for data…
Alok Singh
  • 31
  • 5
0
votes
0 answers

Unable to scale Trino Queries

we are trying to scale up Trino queries, and are currently failing. We use Trino to query Iceberg data, into Dask, in a jupyterlab notebook, and we're running on GKE Kubernetes We are using Dask to check Trino performance as using sql client apps…
0
votes
1 answer

Extract from List of JSON

I have a string field: [{"et": "AS","ct":"MC"},{"et": "AT","ct":"TC"}, {"et": "AQ","ct":"EC"}] I want to get the "ct" column values to be combined together as part of new column something like MC_TC_EC The table is Iceberg Table. I have looked into…
Alok Singh
  • 31
  • 5
0
votes
0 answers

Read Iceberg/Glue table from Glue Notebook Job

I'm really newbie to Spark, Glue and Iceberg and I'm trying to read data from an Iceberg table using a Glue 4.0 notebook. I have two different tables: data Normal table from parquet files. iceberg_data Iceberg table The code I'm using in my glue…
vinicvaz
  • 105
  • 1
  • 11
0
votes
0 answers

Unable to read hive external tables using Spark 3.3.x when using iceberg jars with erasure coding

When trying to read erasure coding enabled hive external tables in an on-prem hdfs environment with iceberg jars using Spark 3.3.1, I get below error. I am able to read the same table created with default config i.e. without erasure coding. Is there…
Atif
  • 2,011
  • 9
  • 23
0
votes
0 answers

Bucketed joins in PySpark/Iceberg

I'm trying to perform a join between two tables in PySpark using the iceberg format. I'm trying to use bucketing to improve performance, and avoid a shuffle, but it appears to be having no effect whatsoever. What might I be missing? Code for…
0
votes
1 answer

Conditional SQL Query with Self Join with given conditions

I have an Iceberg table something like: Columns: [custID,X, uniqueTransId ------] [127, 2, a0, -----] [127,2, a1, -----] [127,3, a2, -----] [127,4, a3, -----] [127,5, a4, -----] [128,6, a5, -----] [129, 7, a6, -----] [129, 8, a7, -----] [130, 2, a8,…
Alok Singh
  • 31
  • 5