I am trying to update multiple columns from one delta table based on values fetched from another delta table. The update sql below works in Oracle but not in Spark Delta, can you please help?
deptDf = sqlContext.createDataFrame(
[(10, "IT", "Seattle"), (20, "Accounting", "Renton"), (30, "Finance", "Bellevue"), (40, "Manufacturing", "Tacoma"), (50, "Inventory", "Bothell")],
("dno", "dname", "location"))
updateddeptlocDf = sqlContext.createDataFrame(
[(20, "Accounting and Finance", "SODO"), (10, "Technology", "SODO")], ("dno", "updated_name", "updated_location"))
deptDf.write.format("delta").mode("Overwrite").save("/mnt/delta/dept")
updateddeptlocDf.write.mode("Overwrite").format("delta").save("/mnt/delta/updatedDept")
spark.sql("DROP TABLE IF EXISTS deptdelta")
spark.sql("DROP TABLE IF EXISTS updated_dept_location")
spark.sql("CREATE TABLE deptdelta USING DELTA LOCATION '/mnt/delta/dept'")
spark.sql("CREATE TABLE updated_dept_location USING DELTA LOCATION '/mnt/delta/updatedDept'")
And the update statement I am trying to issue which fails is:
UPDATE deptdelta d
SET (d.dname, d.location) = (SELECT ud.updated_name, ud.updated_location FROM updated_dept_location u WHERE d.dno = u.dno )
WHERE EXISTS (SELECT 'a' from updated_dept_location u1 WHERE d.dno = u1.dno )
error:
Error in SQL statement: ParseException: mismatched input ',' expecting EQ(line 2, pos 11)
== SQL == UPDATE deptdelta d SET d.dname, d.location = (SELECT ud.updated_name, ud.updated_location FROM updated_dept_location u WHERE d.dno = u.dno ) -----------^^^ WHERE EXISTS (SELECT 'a' from updated_dept_location u1 WHERE d.dno = u1.dno )