2

I'm in EMR getting data from Glue Catalog

EMR STUDIO

when I try to pass this data and read it via Spark SQL it throws me the following error:

Error

Caused by: org.apache.spark.SparkUpgradeException: 
You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps 
before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. 
See more details in SPARK-31404. You can set spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. Or set spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'CORRECTED' to read the datetime values as it is.
    at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:159)
    at org.apache.spark.sql.execution.datasources.DataSourceUtils$.$anonfun$creteTimestampRebaseFuncInRead$1(DataSourceUtils.scala:209)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anon$4.addLong(ParquetRowConverter.scala:330)
    at org.apache.parquet.column.impl.ColumnReaderImpl$2$4.writeValue(ColumnReaderImpl.java:268)
    at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
    at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
    ... 21 more

I tried to change the following settings in spark but there was no successful result

spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead","CORRECTED") and spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

I also did a select on the view created with the following code and it worked without problems.

enter image description here

so it makes me think the problem is when I use% sql

why does this happen? I am doing something wrong?

Rohit Nimmala
  • 1,459
  • 10
  • 28
  • Was a solution to this found? – N. P. Jan 06 '22 at 15:37
  • 1
    For my part, the code is correct and we have done tests with the support of aws for video calls. I have raised a ticket in the amazon support. It is still under review. – Josefina Andrea Araya Tapia Jan 07 '22 at 16:47
  • The problem is that the flags you are setting are being overwritten, and you don't have control over that. [See here](https://stackoverflow.com/a/69040997/534238) for more info (it's Glue, but same idea). – Mike Williamson May 24 '22 at 09:46

0 Answers0