Questions tagged [iceberg]

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.

134 questions
1
vote
1 answer

Count total partition for Athena table

I have Athena and Athena Iceberg tables partitioned by multiple columns. I want to create a logic in Python script to break data if have more than 100 total partitions to insert the data without errors. SHOW PARTITIONS table_name lists all the…
1
vote
1 answer

Spark Shell not working after adding support for Iceberg

We are doing POC on Iceberg and evaluating it first time. Spark Environment: Spark Standalone Cluster Setup ( 1 master and 5 workers) Spark: spark-3.1.2-bin-hadoop3.2 Scala: 2.12.10 Java: 1.8.0_321 Hadoop: 3.2.0 Iceberg 0.13.1 As suggested in…
Vikramsinh Shinde
  • 2,742
  • 2
  • 23
  • 29
1
vote
1 answer

Add table description to iceberg table from pyspark

I was able to add a table comment to an iceberg table using trino, with this trino command: comment on table iceberg.table_schema.table_name is 'My Comment' Also It is possible to read that from pyspark using: spark.sql("describe extended…
Itai Sevitt
  • 140
  • 1
  • 7
1
vote
2 answers

Scala Option Types not recognized in apache flink table api

I am working on building a flink application which reads data from kafka topics, apply some transformations and writes to the Iceberg table. I read the data from kafka topic (which is in json) and use circe to decode that to scala case class with…
Praneeth Ramesh
  • 3,434
  • 1
  • 28
  • 34
1
vote
1 answer

Apache Iceberg to index AWS S3

I have a usecase where there are about 100M files stored on S3. I have a manifest file separately for the location of these files based on my data model. I want to understand if Apache Iceberg is a good fit to provide indexing of my S3…
Raks
  • 870
  • 3
  • 11
  • 27
1
vote
0 answers

'java.lang.VerifyError: Stack map does not match the one at exception handler 70' while using sql-client of flink whith iceberg-runtime and hive

According to https://iceberg.apache.org/flink/ ,I use sql-client of flink with option -j: bin/sql-client.sh embedded -j lib/flink-sql-connector-hive-2.3.6_2.11-1.11.3.jar -j lib/iceberg-flink-runtime-0.11.0.jar shell and meet the following…
xfly
  • 31
  • 7
1
vote
1 answer

flink: Interrupted while waiting for data to be acknowledged by pipeline

I was doing a POC of flink CDC + iceberg. I followed this debezium tutorial to send cdc to kafka - https://debezium.io/documentation/reference/1.4/tutorial.html. My flink job was working fine and writing data to hive table for inserts. But when I…
Ayush Chauhan
  • 441
  • 7
  • 25
1
vote
2 answers

How to write data to Apache Iceberg tables using Spark SQL?

I am trying to familiarize myself with Apache Iceberg and I'm having some trouble understanding how to write some external data to a table using Spark SQL. I have a file, one.csv, sitting in a directory, /data my Iceberg catalog is configured to…
gclarkjr5
  • 161
  • 1
  • 2
  • 10
0
votes
0 answers

Apache Iceberg ListType(StructType) columns not working in Spark SQL

I am trying to ADD a COLUMN to an existing Iceberg table using Spark SQL but I get an invalid SQL syntax when constructing the string SQL. The function that creates the Iceberg column is the following: def sparkTypeToType[A <: DataType](sparkType:…
Oscar Drai
  • 141
  • 1
  • 7
0
votes
1 answer

Not able to get Array size in Apache Iceberg with Spark 3.2.0 or before

From official doc: https://spark.apache.org/docs/latest/api/sql/index.html#array_size , it is present from Spark 3.3.0 but I need the same in Spark 3.2.0 Is there some alternative for array_size that I can use while writing SQL query for data…
Alok Singh
  • 31
  • 5
0
votes
0 answers

Unable to scale Trino Queries

we are trying to scale up Trino queries, and are currently failing. We use Trino to query Iceberg data, into Dask, in a jupyterlab notebook, and we're running on GKE Kubernetes We are using Dask to check Trino performance as using sql client apps…
0
votes
1 answer

Extract from List of JSON

I have a string field: [{"et": "AS","ct":"MC"},{"et": "AT","ct":"TC"}, {"et": "AQ","ct":"EC"}] I want to get the "ct" column values to be combined together as part of new column something like MC_TC_EC The table is Iceberg Table. I have looked into…
Alok Singh
  • 31
  • 5
0
votes
0 answers

Read Iceberg/Glue table from Glue Notebook Job

I'm really newbie to Spark, Glue and Iceberg and I'm trying to read data from an Iceberg table using a Glue 4.0 notebook. I have two different tables: data Normal table from parquet files. iceberg_data Iceberg table The code I'm using in my glue…
vinicvaz
  • 105
  • 1
  • 11
0
votes
0 answers

Iceberg - Spark - Hive in SparkThrift clarification needed

We are using a Spark Thrift server, with Iceberg table format, with Parquet data file, and Spark as execution engine. 1, When I submit an sql statement to Spark Thrift server, what type this statement is? Is it Spark SQL? or HiveSQL? Because I saw…
0
votes
0 answers

Why can I not connect to metastore server from spark iceberg container?

I have docker compose file that spins up below containers. Spark iceberg Hive metasotre mariaDB kafka trino minio My aim is to read streaming data from kafka container and write it into iceberg table.I am using pyspark for this in a…
Sh_ch
  • 1
1 2 3
8 9