0

In Databricks I have a existing delta table, In which i want to add one more column, as Id so that each row has unique id no and It is consecutive (how primary key is present in sql).

So far I have tried converting delta table to pyspark dataframe and have added new column as


from pyspark.sql.window import Window as W
from pyspark.sql import functions as F
df1 = df1.withColumn("idx", F.monotonically_increasing_id())
windowSpec = W.orderBy("idx")
df1 = df1.withColumn("idx", F.row_number().over(windowSpec)).show()

I tried writing it back to delta table,

df.write.mode("append").format("delta").save(location/db.tablename)

It writes back but the data values After querying is null for the new id column. I read, overwrite mode will erase all previous data. How can I bring the column id data to delta table and keep incrementing the id column when data gets inserted?

I am trying to achieve adding a autoincrement column for delta table. The databricks runtime is 7.3.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132

1 Answers1

1

I am trying to achieve adding a autoincrement column for delta table. The databricks runtime is 7.3.

from official document,

Identity column feature is supported for runtime version 10.4 and later not for below runtime 10.4

Altering table by adding new Column with Identity is also not supported

To achieve your goal first you have to migrate from runtime 7.3 to runtime 10.4 And create new table with identity column and then copy the data from first table to new table.

Pratik Lad
  • 4,343
  • 2
  • 3
  • 11