0

iam trying to delete duplicate values by partition (dt), but iam getting error:

An error was encountered:
u'Cannot overwrite a path that is also being read from.;'

Query i'am using is:

query = "SELECT DISTINCT * FROM {} WHERE dt = '{}'".format(table_name, partition_date)
df = spark.sql(query)
df.createOrReplaceTempView("temp_table")
overwrite_query = "INSERT OVERWRITE TABLE {} PARTITION (dt) SELECT * FROM temp_table".format(table_name, partition_date)

Where iam I wrong ?

DariusB
  • 13
  • 4
  • have you saved your data before overwriting? – Ashok Aug 25 '23 at 08:54
  • Yes, its backup copy – DariusB Aug 25 '23 at 09:04
  • Basically, you are writing in the same path where you are reading from. And that's not possible. On this [answer](https://stackoverflow.com/questions/76959833/change-column-name-in-table-and-delta-files/76960618#76960618) you can see how to get around it and write where you are reading from. – cruzlorite Aug 25 '23 at 09:20

0 Answers0