Highest Voted 'iceberg' Questions

0

votes

2 answers

Version control of big data tables (iceberg)

I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column…

asked Oct 27 '22 at 07:34

Wojtek

1

0

votes

1 answer

Partition bucket by year and month in PySpark

I have a DF like: Cod Date 1 2022-01-01 1 2022-01-10 1 2022-02-01 2 2022-03-01 2 2022-04-01 I'm trying to use Apache Iceberg to partition my DF by Cod/Year/Month using hidding partitioning. spark.sql("CREATE TABLE local.table…

apache-spark pyspark iceberg

asked Oct 24 '22 at 19:29

OdiumPura

444
5
25

0

votes

0 answers

Apache iceberg: how to set write.metadata.previous-versions-max

Having many historical metadata files in apache iceberg helps us to produce a linear history of table versions and ensures that concurrent writes are not lost. In Apache iceberg there is a table write property…

iceberg apache-iceberg

asked Oct 20 '22 at 12:13

wbrycki

121
1
8

0

votes

0 answers

Unable to create Iceberg tables using Pyspark in Hive

I am trying to create Iceberg(0.11.1) formatted tables in Hive 3.1.1 using Pyspark 3.0.2 but getting below errors and warnings. Any help will be greatly appreciated. Let me know if I need to add any more details. Code to create table: spark.sql("…

apache-spark pyspark hive bigdata iceberg

asked Oct 19 '22 at 07:49

Atif

2,011
9
23

0

votes

1 answer

Apache Fink & Iceberg: Not able to process hundred of RowData types

I have a Flink application that reads arbitrary AVRO data, maps it to RowData and uses several FlinkSink instances to write data into ICEBERG tables. By arbitrary data I mean that I have 100 types of AVRO messages, all of them with a common…

apache-flink flink-streaming iceberg apache-iceberg

asked Oct 18 '22 at 09:47

nach0

379
1
3
14

0

votes

0 answers

How can I improve AWS Athena Iceberg read/write operations?

I have two identical tables; one created as the result of using a crawler on a .csv and the other an Iceberg table created with the following command: CREATE TABLE dan_grafana.iced ( meter string, readtime timestamp, kwh_total…

amazon-web-services amazon-athena iceberg

asked Oct 03 '22 at 23:32

Dan M

4,340
8
20

0

votes

1 answer

how to use replaceWhere option with Apache iceberg while writing data

I'm currently trying to write data using Iceberg to an external Hive table which is partitioned by partition_date column. Before writing the data with Iceberg format, test table has 2 rows, ("2015-01-02", "S01233",…

scala apache-spark apache-spark-sql iceberg apache-iceberg

asked Sep 20 '22 at 17:26

Leroy Mikenzi

792
6
22
46

0

votes

0 answers

spark2 cbo on iceberg table

I am using Spark2.4 with an iceberg table. I want to enable CBO but I do not find a way to calc table stats. The table is created using Iceberg Catalog API and data is populated using Spark DataFrame. Is there a way to do that?

apache-spark iceberg apache-iceberg

asked Sep 19 '22 at 15:36

igreenfield

1,618
19
36

0

votes

1 answer

How to convert Dataset to List

Would like to know how to convert Dataset to List. I'm speaking on: org.apache.avro.generic.GenericRecord org.apache.spark.sql.Dataset org.apache.spark.sql.Row Dataset data = spark.sql(SQL_QUERY) The result is different…

java apache-spark apache-spark-sql avro iceberg

asked Sep 11 '22 at 12:05

Roni Koren Kurtberg

495
1
8
18

0

votes

0 answers

AWS Athena Iceberg Multiple Row UPDATE

I am trying to update an iceberg table on Athena joined with another table My script: UPDATE sampledb.icbg_tenure_20220905 a INNER JOIN sampledb.cit_update b ON a.msisdn_hash = b.msisdn_hash SET a.activation_date = b.activationdate Getting error…

amazon-web-services amazon-athena iceberg

asked Sep 09 '22 at 08:34

c0ng111

31
3

0

votes

1 answer

java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DynamicFileFilterWithCardinalityCheck has interface

While launching the spark-shell with iceberg dependencies, we got the following error: spark-shell \ --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.0 \ --conf…

apache-spark iceberg apache-iceberg

asked Aug 17 '22 at 08:28

Ranga Reddy

2,936
4
29
41

0

votes

1 answer

write apache iceberg table to azure ADLS / S3 without using external catalog

I'm trying to create an iceberg table format on cloud object storage. In the below image we can see that iceberg table format needs a catalog. This catalog stores current metadata pointer, which points to the latest metadata. The Iceberg quick start…

iceberg

asked Aug 15 '22 at 13:00

ns15

5,604
47
51

0

votes

1 answer

Unable to query Iceberg table from PySpark script in AWS Glue

I'm trying to read data from an iceberg table, the data is in ORC format and partitioned by column. I'm getting this error - AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table…

amazon-web-services apache-spark pyspark aws-glue iceberg

asked Jul 27 '22 at 17:11

lightyagami96

336
1
4
14

0

votes

1 answer

"Iceberg query cannot be parsed" when trying to create Iceberg table with MAP column data type in Athena?

According to the Athena Iceberg documentation, the map type is supported. Why do neither of these statements work? CREATE TABLE iceberg_test1 (id string, themap map) LOCATION 's3://mybucket/test/iceberg1' TBLPROPERTIES ( 'table_type' = 'ICEBERG'…

amazon-athena iceberg apache-iceberg

asked May 19 '22 at 04:33

Alex R

11,364
15
100
180

0

votes

1 answer

Error when changing partition field in Iceberg, from spark

we are writing to iceberg using spark, and when renaming the partition field name, we are getting a validation error: org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: some_date: void(1) It…

apache-spark pyspark iceberg

asked Apr 19 '22 at 11:16

Itai Sevitt

140
1
7

Questions tagged [iceberg]