1

I have this table:

+----------+-------------+-------------------+------------------+
|    userId|       testId|               date|              note|
+----------+-------------+-------------------+------------------+
| 123123123|            1|2019-01-22 02:03:00|               aaa|
| 123123123|            1|2019-02-22 02:03:00|               bbb|
| 123456789|            2|2019-03-23 02:03:00|               ccc|
| 123456789|            2|2019-04-23 02:03:00|               ddd|
| 321321321|            3|2019-05-23 02:03:00|               eee|
+----------+-------------+-------------------+------------------+

Would like to get newest note (whole row) for each group userId and testId:

SELECT
    n.userId,
    n.testId,
    n.date,
    n.note
FROM 
    notes n
INNER JOIN (
    SELECT 
        userId,
        testId,
        MAX(date) as maxDate
    FROM 
        notes
    GROUP BY 
        userId,
        testId
) temp ON n.userId = temp.userId AND n.testId = temp.testId AND n.date = temp.maxDate

It works.

But now I'd like to also have previous note in each row:

+----------+-------------+-------------------+-------------+------------+
|    userId|       testId|               date|         note|previousNote|
+----------+-------------+-------------------+-------------+------------+
| 123123123|            1|2019-02-22 02:03:00|          bbb|         aaa|
| 123456789|            2|2019-04-23 02:03:00|          ddd|         ccc|
| 321321321|            3|2019-05-23 02:03:00|          eee|        null|
+----------+-------------+-------------------+-------------+------------+

Have no idea how to do it. I heard about LAG() function which might be useful but found no good examples for my case.

I'd like to use it on dataframe in pyspark (if it's important)

Robin71
  • 383
  • 5
  • 26
  • if there will be more than two records with same notes than all previous records should be in single column or different column. @robin71 –  Jun 03 '19 at 07:28
  • @LovePandey I need only one previous value, so it'll always be one column. – Robin71 Jun 03 '19 at 07:31
  • i thing you are looking such answer which i have given you below. please check out my answer and suggest me. –  Jun 03 '19 at 09:28

2 Answers2

2

use lag() and row_number analytic function

select userid,testid,date,note,previous_note
from
(select userid,testid,date,note,
lag(note)over(partition by useid,testid order by date) as previous_note,
row_number() over(partition by userid,testid order by date desc) rn
from table_name
) a where a.rn=1
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
  • Thanks for answer. There is something wrong because query returns only nulls in `previous_note` column – Robin71 Jun 03 '19 at 08:02
  • OK, I replaced lead() with lag() and it looks it works. Please, edit you answer if you agree. – Robin71 Jun 03 '19 at 08:08
  • i think it is not working well. please check once again - @Robin71 –  Jun 03 '19 at 09:30
1
select userid,testid,date,note,previous_note from
(select userid,testid,date,note,lead(note)
over(partition by userid,testid order by date desc) as previous_note,
row_number() over(partition by userid,testid order by date desc) srno
from Table_Name
) a where a.srno=1

I hope it will give you right answer which you want. it will give you latest date as new record and previous date note as previous_Note.