1

Currently I am using Iceberg in my project, so I am having one doubt in that.

My Current Scenario:

  1. I have loaded the data into my Iceberg table using spark data frame(this is my doing through spark job)

    df.writeTo("catalog.mydb.test2").using("iceberg").create()

  2. Now From source side I have added two colums and started the Job which is doing merge

    df.createOrReplaceTempView("myview") spark.sql("MERGE INTO catalog.mydb.test2 as t USING (SELECT * FROM myview) as s ON t.id = s.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT ")

Doing both of these step I am expecting new columns to be added into the target table but it did not worked,

As I can see Iceberg Support full schema evolution.. What does it means..if it is not adding any columns dynamically to my target table.

Please help how can I achieve adding new columns into my target table dynmically.

  • https://iceberg.apache.org/docs/latest/spark-ddl/#spark-ddl – RakeshV Aug 20 '22 at 05:18
  • 2
    Iceberg lets you **manage** the schema evolution, but it won't do it for you. It doesn't make sense to add columns 'dynamically'; even though it might be convenient in some cases, in most cases it can cause unmanageable catastrophes. You need to explicitly add new columns to your tables. – shay__ Aug 28 '22 at 08:11
  • You can use Spark SQL for it: `ALTER TABLE ADD COLUMN ` – Oscar Drai Aug 22 '23 at 14:13

1 Answers1

2

You can enable this with merge-schema option, but we don't recommend it because, as @shay__ points out, it can sometimes cause unmanageable catastrophes.

liliwei
  • 294
  • 1
  • 8
  • where do we add this option exactly, it would be a great help if you can tell a few of the unmanageable catastrophes. Because if the use case is a table should accept incoming data even if a few columns are missing or there are extra columns, this should be a good solution, isn't it? – isrj5 Apr 12 '23 at 04:53