Questions tagged [iceberg]

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.

134 questions
0
votes
2 answers

Version control of big data tables (iceberg)

I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column…
0
votes
1 answer

Partition bucket by year and month in PySpark

I have a DF like: Cod Date 1 2022-01-01 1 2022-01-10 1 2022-02-01 2 2022-03-01 2 2022-04-01 I'm trying to use Apache Iceberg to partition my DF by Cod/Year/Month using hidding partitioning. spark.sql("CREATE TABLE local.table…
OdiumPura
  • 444
  • 5
  • 25
0
votes
0 answers

Apache iceberg: how to set write.metadata.previous-versions-max

Having many historical metadata files in apache iceberg helps us to produce a linear history of table versions and ensures that concurrent writes are not lost. In Apache iceberg there is a table write property…
wbrycki
  • 121
  • 1
  • 8
0
votes
0 answers

Unable to create Iceberg tables using Pyspark in Hive

I am trying to create Iceberg(0.11.1) formatted tables in Hive 3.1.1 using Pyspark 3.0.2 but getting below errors and warnings. Any help will be greatly appreciated. Let me know if I need to add any more details. Code to create table: spark.sql("…
Atif
  • 2,011
  • 9
  • 23
0
votes
1 answer

Apache Fink & Iceberg: Not able to process hundred of RowData types

I have a Flink application that reads arbitrary AVRO data, maps it to RowData and uses several FlinkSink instances to write data into ICEBERG tables. By arbitrary data I mean that I have 100 types of AVRO messages, all of them with a common…
nach0
  • 379
  • 1
  • 3
  • 14
0
votes
0 answers

How can I improve AWS Athena Iceberg read/write operations?

I have two identical tables; one created as the result of using a crawler on a .csv and the other an Iceberg table created with the following command: CREATE TABLE dan_grafana.iced ( meter string, readtime timestamp, kwh_total…
Dan M
  • 4,340
  • 8
  • 20
0
votes
1 answer

how to use replaceWhere option with Apache iceberg while writing data

I'm currently trying to write data using Iceberg to an external Hive table which is partitioned by partition_date column. Before writing the data with Iceberg format, test table has 2 rows, ("2015-01-02", "S01233",…
0
votes
0 answers

spark2 cbo on iceberg table

I am using Spark2.4 with an iceberg table. I want to enable CBO but I do not find a way to calc table stats. The table is created using Iceberg Catalog API and data is populated using Spark DataFrame. Is there a way to do that?
igreenfield
  • 1,618
  • 19
  • 36
0
votes
1 answer

How to convert Dataset to List

Would like to know how to convert Dataset to List. I'm speaking on: org.apache.avro.generic.GenericRecord org.apache.spark.sql.Dataset org.apache.spark.sql.Row Dataset data = spark.sql(SQL_QUERY) The result is different…
Roni Koren Kurtberg
  • 495
  • 1
  • 8
  • 18
0
votes
0 answers

AWS Athena Iceberg Multiple Row UPDATE

I am trying to update an iceberg table on Athena joined with another table My script: UPDATE sampledb.icbg_tenure_20220905 a INNER JOIN sampledb.cit_update b ON a.msisdn_hash = b.msisdn_hash SET a.activation_date = b.activationdate Getting error…
c0ng111
  • 31
  • 3
0
votes
1 answer

java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface

While launching the spark-shell with iceberg dependencies, we got the following error: spark-shell \ --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.0 \ --conf…
Ranga Reddy
  • 2,936
  • 4
  • 29
  • 41
0
votes
1 answer

write apache iceberg table to azure ADLS / S3 without using external catalog

I'm trying to create an iceberg table format on cloud object storage. In the below image we can see that iceberg table format needs a catalog. This catalog stores current metadata pointer, which points to the latest metadata. The Iceberg quick start…
ns15
  • 5,604
  • 47
  • 51
0
votes
1 answer

Unable to query Iceberg table from PySpark script in AWS Glue

I'm trying to read data from an iceberg table, the data is in ORC format and partitioned by column. I'm getting this error - AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table…
0
votes
1 answer

"Iceberg query cannot be parsed" when trying to create Iceberg table with MAP column data type in Athena?

According to the Athena Iceberg documentation, the map type is supported. Why do neither of these statements work? CREATE TABLE iceberg_test1 (id string, themap map) LOCATION 's3://mybucket/test/iceberg1' TBLPROPERTIES ( 'table_type' = 'ICEBERG'…
Alex R
  • 11,364
  • 15
  • 100
  • 180
0
votes
1 answer

Error when changing partition field in Iceberg, from spark

we are writing to iceberg using spark, and when renaming the partition field name, we are getting a validation error: org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: some_date: void(1) It…
Itai Sevitt
  • 140
  • 1
  • 7
1 2 3
8
9