We are migrating a huge codebase from Spark2 to Spark 3.x. In order to make the migration incrementally, some configs were set to legacy to have the same behavior as in Spark 2.x. The function add_months, however, AFAIK does not have a "legacy" mode. In Spark3 according to the migration docs
In Spark 3.0, the add_months function does not adjust the resulting date to a last day of month if the original date is a last day of months. For example, select add_months(DATE'2019-02-28', 1) results 2019-03-28. In Spark version 2.4 and below, the resulting date is adjusted when the original date is a last day of months. For example, adding a month to 2019-02-28 results in 2019-03-31.
While Spark 2.x adjusts the resulting date to the last day of the month. The obvious solution would be to write a wrapper around it but I wonder if there is any configuration in Spark3 to get add_months Spark2 behavior.
EDIT:
I ended up implementing a wrapper to add_months in Scala Spark 3.x:
object functions {
def add_months(startDate: Column, numMonths: Int): Column = add_months(startDate, lit(numMonths))
def add_months(startDate: Column, numMonths: Column): Column = {
val addedMonthsSpark = add_months_spark(startDate, numMonths)
val startDateIsLastDay = last_day(startDate) === startDate
when(startDateIsLastDay, last_day(addedMonthsSpark)).otherwise(addedMonthsSpark)
}
}