0

I joined two tables and tried to get a max timestamp value of two timestamp fields. This is my SQL statement:

SELECT 
   rts.DataLakeModified_DateTime, 
   rtt.DataLakeModified_DateTime,
   (
     SELECT MAX(DataLakeModifiedDateTime)
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime)
   ) AS DataLakeModifiedDateTime
FROM RetailTransactionSalesTrans_Full rts
LEFT JOIN RetailTransactionTable_Full rtt
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID
        AND rtt.CHANNEL = rts.CHANNEL
        AND rtt.STORE = rts.STORE
        AND rtt.TERMINAL = rts.TERMINALID

But it has an error:

Error in SQL statement: AnalysisException: cannot evaluate expression outer(rts.DataLakeModified_DateTime) in inline table definition; line 6 pos 17

I'm using Spark3.2.1 and MSSQL database. And I'm running it on Azure Databricks notebook using 104 LTS cluster (include Apache Spark3.2.1, Scala 2.12).

Do you have any solution to solve this problem?

NinjaDev
  • 1,228
  • 11
  • 21
  • Put your `VALUES` in a `OUTER APPLY` after the joins – HoneyBadger Jun 24 '22 at 15:17
  • @HoneyBadger, thanks for your response. Do you mean I should use `LEFT OUTER JOIN` instead of `LEFT JOIN`? – NinjaDev Jun 24 '22 at 15:21
  • no, though an `OUTER APPLY` is semantically similar. Take a look [here](https://www.google.com/amp/s/sqlstudies.com/2013/05/20/the-many-uses-of-cross-apply/amp/), for what I mean (they use `CROSS APPLY`, which is more like an `INNER JOIM`) – HoneyBadger Jun 24 '22 at 15:24

2 Answers2

1

The error seems to be because of the usage of the following SQL statement to get the maximum timestamp value.

SELECT MAX(DataLakeModifiedDateTime) 
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime) 
  • I tried with 2 sample tables tb1 and tb2 which have timestamp fields t1 and t2.The following is how the tables look.

enter image description here

  • When I used to get maximum time stamp values using the same syntax that you used, it gave the same error.

enter image description here

  • Instead, when I tried it using the CASE statement, it gave the desired output.

enter image description here

So, modify your SQL query as shown below:

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   ( 
     CASE WHEN rts.DataLakeModified_DateTime > rtt.DataLakeModified_DateTime THEN rts.DataLakeModified_DateTime ELSE rtt.DataLakeModified_DateTime END 
) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 
Saideep Arikontham
  • 5,558
  • 2
  • 3
  • 11
  • It's working for me. thank you very much! Also I found another solution with `GREATEST` function, simply. ` GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime ` It looks like it's simply than `WHEN ... THEN ...` clause. – NinjaDev Jun 24 '22 at 16:07
0

I found another solution with GREATEST function, simply.

GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 

It looks like it's simply than WHEN ... THEN ... clause.

Finally, the SQL statement is :

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 

It worked correctly. Thank you!

NinjaDev
  • 1,228
  • 11
  • 21