Having known about calendar change in Spark 3.0, I am trying to understand why the cast is failing in this particular instance. Spark 3.0 has issues with dates before year 1582. However, in this example, year is greater than 1582.
rdd = sc.parallelize(["3192016"])
df = rdd.map(row).toDF()
df.createOrReplaceTempView("date_test")
sqlDF = spark.sql("SELECT to_date(date, 'yyyymmdd') FROM date_test")
Fails with
Py4JJavaError: An error occurred while calling o1519.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 167.0 failed 4 times, most recent failure: Lost task 10.3 in stage 167.0 (TID 910) (172.36.189.123 executor 3): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '3192016' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.