I have an existing table, which I'd like to append two columns to. I create a Spark dataframe:
spark_df = spark.createDataFrame(df)
Then I'd like to use MERGE INTO as so:
spark.sql(f"""MERGE INTO x.y AS m
USING {spark_df} AS s
ON m.id = s.id
WHEN MATCHED THEN
UPDATE SET m.id = s.id
WHEN NOT MATCHED
THEN INSERT (id, colnew_1) VALUES (id, spark_df[["myCol"]])""")
I retrieve a syntax error when trying to parse the spark_df. Is this functionality possible? I understand that a Delta table is to be created first, so that the MERGE operation is supported. However I'm a bit confused on the sequence of events. For example, I can create a delta table like so:
CREATE TABLE x.y_delta (id bigint, colnew_1 bigint) USING delta
However this table is empty. I suppose an intermediate step is to completely copy the original table, to this new delta table. Then use this delta table accordingly. Though I'm not convinced that this is also right.