1

I have tried convert a string ddMMyy using to_date function to yyyyMMdd

But the spark cast the date to 1900 year

for exemple: I tried cast 150545 to 20450515 but got 19450515

 #my_date = '150545'
 df = df.withColumn('sorce_format', lit('ddMMyy'))
 df = df.withColumn('target_format', lit('yyyyMMdd'))
 def cast_date_fields(df ):
        
        df = df.withColumn(
            "data_ok",
            expr("to_date(to_date(mydate,sorce_format), target_format)").cast('String'))
       
        return df

Using jupter notebook the cast working fine but using aws glue the cast convert the date to 1900 year.

Eriton Silva
  • 129
  • 1
  • 10
  • How is this supposed to work though? How is Spark supposed to know whether you want 1945 or 1845? – Robert Kossendey Oct 11 '21 at 15:37
  • that's a string, just manipulate it a little bit and add `'20'` at the begining of the year. – Steven Oct 11 '21 at 15:48
  • Spark would always output the year `20xx`: _For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive._ ([Link](https://spark.apache.org/docs/3.1.1/sql-ref-datetime-pattern.html)). So the issue is probably related to glue. Btw: you could replace the outer `to_date` with [date_format](https://spark.apache.org/docs/3.1.1/api/sql/index.html#date_format) – werner Oct 11 '21 at 16:27

0 Answers0