I am using spark2.4.5 with java8 in my spark job which writes data into an s3 path. Due to multiple triggers of job accidentally, it created duplicate records. I am trying to remove the duplicates from s3 path using databricks.
While i am trying to perform delete operation as below from table "final_vals"
%sql
delete from final_vals where rank1 in (select rank1 from ( select ROW_NUMBER() over ( partition by id,data_date,data_type,data_value, version_id order by create_date,last_update_date ) as rank1
from final_vals )
where rank1 <> 1 ) ;
Its throwing error as below
Error in SQL statement: DeltaAnalysisException: Multi-column In predicates are not supported in the DELETE condition.
How to fix this issue? what am I doing wrong here?