22

I am new to Spark SQL. We are migrating data from SQL server to Databricks. I am using SPARK SQL . Can you please suggest how to achieve below functionality in SPARK sql for the below datefunctions. I can see datediff gives only days in spark sql.

DATEDIFF(YEAR,StartDate,EndDate)
DATEDIFF(Month,StartDate,EndDate) 
DATEDIFF(Quarter,StartDate,EndDate)
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
SQLGirl
  • 287
  • 2
  • 3
  • 6

3 Answers3

39

As you have mentioned SparkSQL does support DATEDIFF but for days only. I would also be careful as it seems the parameters are the opposite way round for Spark, ie

--SQL Server
DATEDIFF ( datepart , startdate , enddate )

--Spark
DATEDIFF ( enddate , startdate )

Spark does however support a similar function called months_between which you could use in place of DATEDIFF( month .... This function also returns a decimal amount so optionally cast it to INT for similar functionality to the

SELECT startDate, endDate, 
  DATEDIFF( endDate, startDate ) AS diff_days,
  CAST( months_between( endDate, startDate ) AS INT ) AS diff_months      
FROM yourTable
ORDER BY 1;

There are also year and quarter functions for determining the year and quarter of a date respectively. You could simply minus the years but quarters would be more tricky. It may be you have to 'do the math' or end up using a calendar table.

wBob
  • 13,710
  • 3
  • 20
  • 37
  • Thanks very much. It really helps. Yes, Quarter difference looks tricky. – SQLGirl Oct 01 '18 at 03:12
  • Just to clarify SQL server seems to require DATEDIFF (datepart, recentDate, olderDate) as startdate and enddate are a bit nebulous. – nrmad Aug 17 '21 at 11:13
3

As Spark doesn't provide the other unit, I use below method,

select 
    (bigint(to_timestamp(endDate))) - (bigint(to_timestamp(startDate))) as time_diff

This results in the second unit, so dividing by 60 or 3600 can transform the units.

박재홍
  • 31
  • 1
0

Recent updates in Spark support datediff(endDate,StartDate) and returns the days.

Sam
  • 281
  • 5
  • 15