(Py)Spark to_date convert 31-DEC-98 to 2098-12-31. Is there a way to make it 1998-12-31?
The document does not have an option to select 1000 or 2000.
to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted.
grade_type = spark.read\
.option("header", "true")\
.option("nullValue", "")\
.option("inferSchema", "true")\
.csv("student/GRADE_TYPE_DATA_TABLE.csv")
grade_type.show(3)
-----
+---------------+-----------+----------+------------+-----------+-------------+
|GRADE_TYPE_CODE|DESCRIPTION|CREATED_BY|CREATED_DATE|MODIFIED_BY|MODIFIED_DATE|
+---------------+-----------+----------+------------+-----------+-------------+
| FI| Final| MCAFFREY| 31-DEC-98| MCAFFREY| 31-DEC-98|
| HM| Homework| MCAFFREY| 31-DEC-98| MCAFFREY| 31-DEC-98|
| MT| Midterm| MCAFFREY| 31-DEC-98| MCAFFREY| 31-DEC-98|
+---------------+-----------+----------+------------+-----------+-------------+
grade_type = spark.read\
.option("header", "true")\
.option("nullValue", "")\
.option("inferSchema", "true")\
.csv("student/GRADE_TYPE_DATA_TABLE.csv")\
.withColumn("CREATED_DATE", to_date(col('CREATED_DATE'), "dd-MMM-yy"))\
.withColumn("MODIFIED_DATE", to_date(col('MODIFIED_DATE'), "dd-MMM-yy"))
grade_type.show(3)
-----
+---------------+-----------+----------+------------+-----------+-------------+
|GRADE_TYPE_CODE|DESCRIPTION|CREATED_BY|CREATED_DATE|MODIFIED_BY|MODIFIED_DATE|
+---------------+-----------+----------+------------+-----------+-------------+
| FI| Final| MCAFFREY| 2098-12-31| MCAFFREY| 2098-12-31|
| HM| Homework| MCAFFREY| 2098-12-31| MCAFFREY| 2098-12-31|
| MT| Midterm| MCAFFREY| 2098-12-31| MCAFFREY| 2098-12-31|
+---------------+-----------+----------+------------+-----------+-------------+