Questions tagged [apache-iceberg]

Apache Iceberg is a high-performance table format to enable analytics purposes. It allows SQL tables to be consumed by analytics tools such as Apache Spark, Apache Flink, Apache Hive, Trino, PrestoDB, Impala, StarRocks, Doris, and Pig.

Apache Iceberg (often referred only as Iceberg) is a high-performance table format to enable analytics purposes. It allows SQL tables to be consumed by analytics tools such as Apache Spark, Apache Flink, Apache Hive, Trino, PrestoDB, Impala, StarRocks, Doris, and Pig.

68 questions

vote

0 answers

Can not read my glue catalog table from glue notebook with sparkdataframes

Hello I have built an apache iceberg database in s3 and added it to glue catalog so that I can query it from athena. Now I am trying to perform some ETL from glue notebooks but it keeps on returning the following error AnalysisExeption:…

asked Mar 07 '23 at 00:10

Aleksei Díaz

vote

1 answer

Iceberg - MERGE INTO TABLE is not supported temporarily

I tried to merge data from parquet file into java.lang.UnsupportedOperationException: MERGE INTO TABLE is not supported temporarily. I use spark 3.3.0 with iceberg 1.1.0 running on a dataproc cluster which already attached to a dataproc metastore…

pyspark google-cloud-dataproc apache-iceberg google-cloud-dataproc-metastore

asked Mar 02 '23 at 08:56

suisen

vote

0 answers

pip install iceberg - Preparing metadata (setup.py) error

Command: pip install iceberg Returns this error: C:\Users\abc>pip install iceberg Collecting iceberg Using cached iceberg-0.4.tar.gz (17 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py…

python apache-spark pip python-3.8 apache-iceberg

asked Dec 23 '22 at 13:09

Sagar Waghmare

vote

0 answers

how to copy a existing glue table to a iceberg format table with athena?

i have a a lot of json files in s3 which are updated frequently. Basically i am doing CRUD operations in a datalake. Because apache iceberg can handle item-level manipulations, i would like to migrate my data to use apache iceberg as table…

amazon-s3 amazon-athena iceberg aws-glue apache-iceberg

asked Dec 16 '22 at 22:43

Khan

1,418
1
25
49

vote

0 answers

Table data empty in athena

Hi I am struggling to create an table on aws athena, the table DDL is as follows: CREATE TABLE iceberg.matrix_b_blueprint_1 ( entrytime timestamp, key string, ingestion_time timestamp, field_0 int, field_1 string, field_2 boolean, field_3…

amazon-web-services amazon-athena apache-iceberg

asked Nov 29 '22 at 18:22

user3476582

vote

1 answer

How to rewrite Apache Iceberg data files to another format?

I'd like to use the Apache Iceberg Apache Spark-Java based API for rewriting data files on my Iceberg table. I'm writing my data files in an Avro format, but I'd like to rewrite them to Parquet. Is it possible in a somewhat easy way? I've researched…

java apache-spark data-lake iceberg apache-iceberg

asked Nov 13 '22 at 23:22

apache-northeast

vote

1 answer

Apache Iceberg Scheme Evolution using Spark

Currently I am using Iceberg in my project, so I am having one doubt in that. My Current Scenario: I have loaded the data into my Iceberg table using spark data frame(this is my doing through spark…

apache-spark iceberg apache-iceberg

asked Aug 16 '22 at 14:00

kunal nandwana

votes

0 answers

Apache Iceberg ListType(StructType) columns not working in Spark SQL

I am trying to ADD a COLUMN to an existing Iceberg table using Spark SQL but I get an invalid SQL syntax when constructing the string SQL. The function that creates the Iceberg column is the following: def sparkTypeToType[A <: DataType](sparkType:…

apache-spark iceberg apache-iceberg

asked Aug 22 '23 at 12:35

Oscar Drai

votes

1 answer

Not able to get Array size in Apache Iceberg with Spark 3.2.0 or before

From official doc: https://spark.apache.org/docs/latest/api/sql/index.html#array_size , it is present from Spark 3.3.0 but I need the same in Spark 3.2.0 Is there some alternative for array_size that I can use while writing SQL query for data…

sql iceberg apache-iceberg

asked Aug 19 '23 at 15:47

Alok Singh

votes

0 answers

Unable to scale Trino Queries

we are trying to scale up Trino queries, and are currently failing. We use Trino to query Iceberg data, into Dask, in a jupyterlab notebook, and we're running on GKE Kubernetes We are using Dask to check Trino performance as using sql client apps…

dask trino iceberg apache-iceberg

asked Aug 17 '23 at 16:40

Michael Forgacs

votes

1 answer

Extract from List of JSON

I have a string field: [{"et": "AS","ct":"MC"},{"et": "AT","ct":"TC"}, {"et": "AQ","ct":"EC"}] I want to get the "ct" column values to be combined together as part of new column something like MC_TC_EC The table is Iceberg Table. I have looked into…

sql iceberg apache-iceberg

asked Aug 16 '23 at 10:27

Alok Singh

votes

0 answers

Read Iceberg/Glue table from Glue Notebook Job

I'm really newbie to Spark, Glue and Iceberg and I'm trying to read data from an Iceberg table using a Glue 4.0 notebook. I have two different tables: data Normal table from parquet files. iceberg_data Iceberg table The code I'm using in my glue…

pyspark aws-glue iceberg apache-iceberg

asked Aug 10 '23 at 12:33

vinicvaz

votes

0 answers

Unable to read hive external tables using Spark 3.3.x when using iceberg jars with erasure coding

When trying to read erasure coding enabled hive external tables in an on-prem hdfs environment with iceberg jars using Spark 3.3.1, I get below error. I am able to read the same table created with default config i.e. without erasure coding. Is there…

apache-spark hadoop hive hdfs apache-iceberg

asked Aug 09 '23 at 16:56

Atif

2,011
9
23

votes

0 answers

Bucketed joins in PySpark/Iceberg

I'm trying to perform a join between two tables in PySpark using the iceberg format. I'm trying to use bucketing to improve performance, and avoid a shuffle, but it appears to be having no effect whatsoever. What might I be missing? Code for…

apache-spark join pyspark apache-iceberg bucketing

asked Aug 08 '23 at 17:07

swagmasta

votes

1 answer

Conditional SQL Query with Self Join with given conditions

I have an Iceberg table something like: Columns: [custID,X, uniqueTransId ------] [127, 2, a0, -----] [127,2, a1, -----] [127,3, a2, -----] [127,4, a3, -----] [127,5, a4, -----] [128,6, a5, -----] [129, 7, a6, -----] [129, 8, a7, -----] [130, 2, a8,…

sql apache-iceberg

asked Aug 08 '23 at 04:09

Alok Singh

Prev 1

3 4 5 Next