1

Having known about calendar change in Spark 3.0, I am trying to understand why the cast is failing in this particular instance. Spark 3.0 has issues with dates before year 1582. However, in this example, year is greater than 1582.

rdd = sc.parallelize(["3192016"]) 
df = rdd.map(row).toDF()
df.createOrReplaceTempView("date_test")
sqlDF = spark.sql("SELECT to_date(date, 'yyyymmdd') FROM date_test")

Fails with

Py4JJavaError: An error occurred while calling o1519.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 167.0 failed 4 times, most recent failure: Lost task 10.3 in stage 167.0 (TID 910) (172.36.189.123 executor 3): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '3192016' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

1 Answers1

0

You just need to turn spark.sql.legacy.timeParserPolicy to LEGACY to get the behaviour from previous versions

There is an error that shows:

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '3192016' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

Here how you can do it with python

spark.sql("set spark.sql.legacy.timeParserPolicy=CORRECTED")

Check quick example in the image below

enter image description here

SCouto
  • 7,808
  • 5
  • 32
  • 49