creating max date function using sql in databricks

Question

I am writing queries in databricks using sql on views and would like to calculate max of dates of update timestamp column across multiple views. for instance i am joining table a with table b and would like to know max(a.updt_ts,b.updt_ts). since max function can not have more than one columns mentioned, i want to create a function. Any help is greatly appreciated. below is what i have:

CREATE temporary FUNCTION ufnGetMaxDt (@Date1 DATETIME2,@Date2 DATETIME2) 
BEGIN  
    DECLARE @ret   DATETIME2
          , @MinDt datetime2;  

    SET @MinDt = cast('1900-01-01' as datetime2);

    IF (@Date1) is null  SET @Date1 = @MinDt; 
    IF (@Date2) is null  SET @Date2 = @MinDt; 


    SET @ret = CASE When @Date1 >= @Date2
                    Then @Date1
                    else @Date2
                    END; 

     IF (@ret IS NULL)   
        SET @ret = @MinDt;  -- Dummy date
    RETURN @ret;  
END
GO

wBob · Accepted Answer · 2018-07-07T09:47:10.847

1

You could just use greatest? eg

SELECT *, GREATEST( date1, date2 ) xmax 
FROM tmp

Or put them in an array, explode it and then max that? eg something like this:

%sql
WITH cte AS
(
SELECT *, EXPLODE( ARRAY( date1, date2 ) ) xmax 
FROM tmp
)
SELECT MAX( xmax )
FROM cte

Seems a bit excessive when you can just use greatest though? It's also worth having a read through the list of Spark SQL built-in functions. You don't have to remember them all but at least if you know something is possible it's useful:

https://spark.apache.org/docs/2.3.0/api/sql/index.html

edited Jul 07 '18 at 09:47

answered Jul 06 '18 at 10:57

wBob

13,710
3
20
37

1

how would i do that..sorry noob question – codewalker Jul 06 '18 at 21:47
1

Not only is it easier to maintain a built in function, but it also helps Spark's optimizer in choosing the best physical plan. The reason is that Spark sees the user defined function as a black box, and as such may miss out on some potential for optimizing it. – Dillon Jul 09 '18 at 12:06

creating max date function using sql in databricks

1 Answers1