0
WITH cte AS  
( SELECT *, ROW_NUMBER() OVER (PARTITION BY columnname 
ORDER BY columnname) row_num FROM  tablename 
)  
DELETE  FROM cte WHERE row_num > 1;

I am using these query for removing duplicate records from my deltatable, but I get error:

AssertionError: assertion failed: No plan for DeleteFromTable

blackraven
  • 5,284
  • 7
  • 19
  • 45

1 Answers1

0

I tried to reproduce the above and I got the same error.

enter image description here

when I expand the error, it is showing the table as parquet. I tried to convert it to delta table but got the same error even after that.

If you are not able to resolve the above error, you can try this workaround with the help of Pyspark dataframe.

This is my sample delta table with some duplicate values.

enter image description here

First select the rows which has rownum as 1 from CTE. Then overwrite these values with your delta table.

Selecting required rows:

Code:

%sql
with cte as
(
select *,row_number() over(Partition by ID order by Name) as row_num from mytable
) select Id,Name,Age from cte where row_num=1

enter image description here

_sqldf is a pyspark dataframe which has latest SQL cell result.

Use this dataframe and overwrite to your delta table.

%python
_sqldf.write.mode("overwrite").format("delta").save("delta table path")

enter image description here

Rakesh Govindula
  • 5,257
  • 1
  • 2
  • 11