13

I think I am seeing a bug in spark where mode 'overwrite' is not respected, rather an exception is thrown on an attempt to do saveAsTable into a table that already exists (using mode 'overwrite').

Below is a little scriptlet that reproduces the issue. The last statement results in a stack trace reading:

 org.apache.spark.sql.AnalysisException: Table `example` already exists.;

Any advice much appreciated.

spark.sql("drop table if exists example ").show()
case class Person(first: String, last: String, age: Integer)
val df = List(
    Person("joe", "x", 9),
    Person("fred", "z", 9)).toDF()
df.write.option("mode","overwrite").saveAsTable("example")

val recover1 = spark.read.table("example")
recover1.show()


val df3 = List(
    Person("mouse", "x", 9),
    Person("golf", "z", 9)).toDF()

 df3.write.
    option("mode","overwrite").saveAsTable("example")      

val recover4 = spark.read.table("example")
recover4.show()     
vikrant rana
  • 4,509
  • 6
  • 32
  • 72
Chris Bedford
  • 2,560
  • 3
  • 28
  • 60

1 Answers1

26

saveAsTable doesn't check extra options, use mode directly

df3.write.mode(SaveMode.Overwrite).saveAsTable("example")

or

df3.write.mode("overwrite").saveAsTable("example")
Gelerion
  • 1,634
  • 10
  • 17
  • spot on! thnx ;^) – Chris Bedford Aug 06 '19 at 05:44
  • 6
    This does not work. I tried ```df1.write.mode('overwrite').saveAsTable('policy1')```. But I still get the same error. Can not create the managed table('`policy1`'). The associated location('spark-warehouse/policy1') already exists – Durga Swaroop Jul 22 '20 at 21:32
  • 1
    @DurgaSwaroop did you find the solution to this? I'm getting the "already exists" error as well – kev Dec 03 '20 at 08:34
  • 2
    The associated location already exists is a bit different case, have you tried setting `spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")` as described here - https://kb.databricks.com/jobs/spark-overwrite-cancel.html – Gelerion Dec 04 '20 at 08:01
  • That is not support anymore with Pyspark > 3 – Muhammad Raihan Muhaimin Apr 10 '22 at 21:43