Highest Voted 'iceberg' Questions

0

votes

0 answers

How to reduce startup latency with Apache Iceberg on AWS EMR Serverless?

I am using Apache Iceberg on Apache ERM Serverless backed by AWS Glue Data Catalog. Following information found on this page, I am using pre-initialized workers, which should allow EMR to "maintain a warm pool of workers for the application so that…

asked Jan 19 '23 at 11:22

Ismael Ghalimi

3,515
2
22
25

0

votes

0 answers

Inconsistent Behaviors for Multiple SparkSessions when accessing the Iceberg Table

I explored the multiple SparkSessions (to connect to different data sources/data clusters) a bit. And I found a wired behavior. Firstly I created a SparkSession to RW the iceberg table, and everything works. Then if I use the new SparkSession (with…

apache-spark apache-spark-sql iceberg

asked Jan 17 '23 at 20:38

batilei

401
3
6
16

0

votes

0 answers

Unable to create iceberg table using Trino

I am trying to create iceberg table using trino but ddl is not executing stating that sort_order not find .below sample ddl Create table .test ( C1 datetime, C2 double , C3 double ) With ( format='parquet', Location='S3a://…

trino iceberg

asked Jan 04 '23 at 14:01

vikram bhati

33
4

0

votes

0 answers

How to update Iceberg table storing time series data

I'm trying to apply some updates to an Iceberg table using pyspark. The original data in the table is: +-------------------+---+---+ | time| A| B| +-------------------+---+---+ |2022-12-01 00:00:00| 1| 6| |2022-12-02 00:00:00| 2| …

pyspark apache-spark-sql time-series iceberg apache-iceberg

asked Dec 27 '22 at 17:04

Jack

149
1
5

0

votes

2 answers

Amazon Athena - Error Create Iceberg table

I used this as a reference to create a Create statement that creates an Apache Iceberg table in Amazon Athena's Query Editor. Below. CREATE TABLE iceberg_table (id int, data string, category string) PARTITIONED BY (category, bucket(16, id)) LOCATION…

sql amazon-web-services amazon-athena iceberg

asked Dec 21 '22 at 11:15

Kensuke Sato

183
1
13

0

votes

0 answers

Flink SQL not rolling iceberg files to hdfs while flink sql streaming job running

I am working in project using flink and iceberg to write data from kafka to iceberg hive table or hdfs using hadoop catalog when i publish message to kafka i can see message in kafka table but there is no file added in hdfs or row added in hive…

apache-kafka hive apache-flink flink-sql iceberg

asked Dec 21 '22 at 00:31

Mohammed Adel hassan

1
1

0

votes

2 answers

Spark ignoring package jars included in the configuration of my Spark Session

I keep running into a java.lang.ClassNotFoundException: Failed to find data source: iceberg. Please find packages at https://spark.apache.org/third-party-projects.html error. I am trying to include the…

scala apache-spark jar iceberg

asked Dec 15 '22 at 20:48

cheezit97

155
2
10

0

votes

0 answers

Setting up Spark and Icerberg on Jupyter Notebooks

I need help to setup spark and iceberg on Jupyter Notebooks, I followed this tutorial on my local but I can't replicate it on J Notebooks, please send any tutorials or walkarounds that you may know. Thanks

apache-spark jupyter-notebook iceberg

asked Dec 08 '22 at 09:47

user3476582

75
1
10

0

votes

0 answers

Any way to disable hive-style partitioning?

I am writing an apache iceberg table that is synced to a metastore. When generating tables, the partitioning appears as hive-style when I'd prefer it to be just the singular value. I also have tested hudi tables, which comes with an inherent way to…

apache-spark pyspark hive iceberg

asked Dec 07 '22 at 18:52

uncommonhobo

1

0

votes

1 answer

Apache Iceberg tables not working with AWS Glue in AWS EMR

I'm trying to load a table in na spark EMR cluster from glue catalog in apache iceberg format that is stored in S3. The table is correctly created because I can query it from AWS Athena. On the cluster creation I have set this…

amazon-web-services apache-spark aws-glue amazon-emr iceberg

asked Dec 07 '22 at 16:38

Shadowtrooper

1,372
15
28

0

votes

0 answers

How to create an Iceberg table built from multiple S3 buckets?

In my organization, it is a common practice to not hold more than 10-50k objects in one S3 bucket. But in Iceberg I've only seen an option of configuring the S3 bucket location of the data at the table level and not at the data files level. I wonder…

apache-spark amazon-s3 iceberg apache-iceberg

asked Nov 19 '22 at 19:19

apache-northeast

11
1

0

votes

0 answers

Iceberg table creation fails in AWS Glue

The table create statement from Presto to AWS Glue metastore fails with error "Lock is not supported by default". presto> create schema isl_progress with(location='s3://xxxxxx/'); CREATE SCHEMA presto> create table…

aws-glue presto iceberg

asked Nov 14 '22 at 17:02

Binoy Thomas

299
3
8

0

votes

1 answer

pyspark configuration for connecting google cloud platform

i can't connect to my google cloud platform via pyspark, can anyone help? i am not using dataproc, just a local spark instance background: i have downloaded all the jars file into $SPARK_HOME/jars,…

apache-spark google-cloud-platform pyspark iceberg

asked Nov 08 '22 at 07:47

KIN SHING WONG

1
1

0

votes

0 answers

Data Compression using Trino Iceberg connector

I have a situation that I am investigating, I am looking to compress parquet files that are sitting on AWS S3, the typical sizes for those files are 90-100 MB, the reason for that is to reduce I/O operations. I am looking to use iceberg connector…

trino iceberg

asked Nov 08 '22 at 07:03

user3476582

75
1
10

0

votes

1 answer

Is it possible to query the diff between two Apache Iceberg snapshots?

I have two snapshots in my Iceberg history table, and I want to be able to see the difference between them, or at least with columns/ rows that have been affected on the last snapshot. Is there an easy way of getting this information?

iceberg apache-iceberg

asked Nov 02 '22 at 22:49

ukrwine10

55
1
6

Questions tagged [iceberg]