Questions tagged [iceberg]

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.

134 questions
0
votes
0 answers

How to reduce startup latency with Apache Iceberg on AWS EMR Serverless?

I am using Apache Iceberg on Apache ERM Serverless backed by AWS Glue Data Catalog. Following information found on this page, I am using pre-initialized workers, which should allow EMR to "maintain a warm pool of workers for the application so that…
Ismael Ghalimi
  • 3,515
  • 2
  • 22
  • 25
0
votes
0 answers

Inconsistent Behaviors for Multiple SparkSessions when accessing the Iceberg Table

I explored the multiple SparkSessions (to connect to different data sources/data clusters) a bit. And I found a wired behavior. Firstly I created a SparkSession to RW the iceberg table, and everything works. Then if I use the new SparkSession (with…
batilei
  • 401
  • 3
  • 6
  • 16
0
votes
0 answers

Unable to create iceberg table using Trino

I am trying to create iceberg table using trino but ddl is not executing stating that sort_order not find .below sample ddl Create table .test ( C1 datetime, C2 double , C3 double ) With ( format='parquet', Location='S3a://…
0
votes
0 answers

How to update Iceberg table storing time series data

I'm trying to apply some updates to an Iceberg table using pyspark. The original data in the table is: +-------------------+---+---+ | time| A| B| +-------------------+---+---+ |2022-12-01 00:00:00| 1| 6| |2022-12-02 00:00:00| 2| …
Jack
  • 149
  • 1
  • 5
0
votes
2 answers

Amazon Athena - Error Create Iceberg table

I used this as a reference to create a Create statement that creates an Apache Iceberg table in Amazon Athena's Query Editor. Below. CREATE TABLE iceberg_table (id int, data string, category string) PARTITIONED BY (category, bucket(16, id)) LOCATION…
Kensuke Sato
  • 183
  • 1
  • 13
0
votes
0 answers

Flink SQL not rolling iceberg files to hdfs while flink sql streaming job running

I am working in project using flink and iceberg to write data from kafka to iceberg hive table or hdfs using hadoop catalog when i publish message to kafka i can see message in kafka table but there is no file added in hdfs or row added in hive…
0
votes
2 answers

Spark ignoring package jars included in the configuration of my Spark Session

I keep running into a java.lang.ClassNotFoundException: Failed to find data source: iceberg. Please find packages at https://spark.apache.org/third-party-projects.html error. I am trying to include the…
cheezit97
  • 155
  • 2
  • 10
0
votes
0 answers

Setting up Spark and Icerberg on Jupyter Notebooks

I need help to setup spark and iceberg on Jupyter Notebooks, I followed this tutorial on my local but I can't replicate it on J Notebooks, please send any tutorials or walkarounds that you may know. Thanks
user3476582
  • 75
  • 1
  • 10
0
votes
0 answers

Any way to disable hive-style partitioning?

I am writing an apache iceberg table that is synced to a metastore. When generating tables, the partitioning appears as hive-style when I'd prefer it to be just the singular value. I also have tested hudi tables, which comes with an inherent way to…
0
votes
1 answer

Apache Iceberg tables not working with AWS Glue in AWS EMR

I'm trying to load a table in na spark EMR cluster from glue catalog in apache iceberg format that is stored in S3. The table is correctly created because I can query it from AWS Athena. On the cluster creation I have set this…
0
votes
0 answers

How to create an Iceberg table built from multiple S3 buckets?

In my organization, it is a common practice to not hold more than 10-50k objects in one S3 bucket. But in Iceberg I've only seen an option of configuring the S3 bucket location of the data at the table level and not at the data files level. I wonder…
0
votes
0 answers

Iceberg table creation fails in AWS Glue

The table create statement from Presto to AWS Glue metastore fails with error "Lock is not supported by default". presto> create schema isl_progress with(location='s3://xxxxxx/'); CREATE SCHEMA presto> create table…
Binoy Thomas
  • 299
  • 3
  • 8
0
votes
1 answer

pyspark configuration for connecting google cloud platform

i can't connect to my google cloud platform via pyspark, can anyone help? i am not using dataproc, just a local spark instance background: i have downloaded all the jars file into $SPARK_HOME/jars,…
0
votes
0 answers

Data Compression using Trino Iceberg connector

I have a situation that I am investigating, I am looking to compress parquet files that are sitting on AWS S3, the typical sizes for those files are 90-100 MB, the reason for that is to reduce I/O operations. I am looking to use iceberg connector…
user3476582
  • 75
  • 1
  • 10
0
votes
1 answer

Is it possible to query the diff between two Apache Iceberg snapshots?

I have two snapshots in my Iceberg history table, and I want to be able to see the difference between them, or at least with columns/ rows that have been affected on the last snapshot. Is there an easy way of getting this information?
ukrwine10
  • 55
  • 1
  • 6
1 2 3
8 9