Here's my use case: I'm migrating out of an old DWH, into Databricks (DBR 10.4 LTS). When moving dimension tables into Databricks, I'd like old SKs (surrogate keys) to be maintained, while creating the SKs column in Databricks Delta as an IDENTITY column, so new dimension values get a new SK, unique over the older SKs coming from the old DWH.
For example, if I have a table d_something, with 2 columns (sk, bk) containing one row:
sk = 12, bk = 'ABC'
I'll copy this into a new Databricks Delta table, and when I insert a new row:
INSERT into d_something (bk)
VALUES ('DEF')
A new SK be generated, so:
sk = 12, bk = 'ABC'
sk = 13, bk = 'DEF'
(doesn't have to be sequential, only uniqueness is mandatory).
By this: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-alter-table.html
I imagine this should be possible to create the table, populate it manually with old SKs, then alter the SK column into IDENTITY (using SYNC IDENTITY?).
Managed to create a fresh table with IDENTITY column, such as:
CREATE TABLE sk_get_test_1 (
sk BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
bk STRING
)
But manually populating the SK column is blocked as IDENTITY columns cannot be manually populated.
Hence my looking for an alternative approach: create it as a regular column, populate the old SKs, and then alter to IDENTITY column. Cant figure how ot make it work though.
Any other ideas here?
Thanks!
Creating the table with IDENTITY column works fine:
CREATE TABLE sk_get_test_1 (
sk BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
bk STRING
)
But trying to insert into the IDENTITY column:
insert into sk_get_test_1 (sk, bk)
values
(1, 'b');
Returns:
AnalysisException: Providing values for GENERATED ALWAYS AS IDENTITY column sk is not supported
Tried ALTER COLUMN
, meaning altering an already populated bigint column into IDENTITY type, based on:
{ { ALTER | CHANGE } [COLUMN] { column_identifier | field_name }
{ COMMENT comment |
{ FIRST | AFTER column_identifier } |
{ SET | DROP } NOT NULL |
SET DEFAULT clause |
DROP DEFAULT |
SYNC IDENTITY } }
(source)
But keeps getting ParseException
errors which I can't overcome.