2

The goal of this query is to delete the records that have the same keys of the new dataframe.

This is the query that I'm executing:

val op=spark.sql(s"""
DELETE 
FROM TABLE1 AS t 
WHERE EXISTS (
    SELECT 1 
    FROM TABLE2 AS s 
    WHERE t.DAY=s.DAY 
    AND t.DATA_STREAM=s.DATA_STREAM
)""")

The error that I'm getting is:

AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate, Window, or Generate, but found:
 exists(t.DAY, t.DATA_STREAM)
in operator DeleteCommandEdge com.databricks.sql.transaction.tahoe.DeltaLog@49e1c8fd, exists#107963 [DAY#108138 && DATA_STREAM#108193]
               ;
DeleteCommandEdge com.databricks.sql.transaction.tahoe.DeltaLog@49e1c8fd, exists#107963 [DAY#108138 && DATA_STREAM#108193]```

As far as I know exists is deterministic. Is there another way to achieve the same result with another query?

The query is running on a Databricks cluster with Spark 3.2.1

ultraInstinct
  • 4,063
  • 10
  • 36
  • 53
  • At this moment I'm not sure why this error happens, but regarding another approaches as I see you are using delta format you may try Delta table merge `whenMatched()...delete` construct to have same result. Also you may try to execute subquery and persist it before you actually do delete. That way you will not have dependency of deleting table itself (I'm just guessing this may be an issue). – partlov Dec 21 '22 at 18:11

0 Answers0