I have this table:
+----------+-------------+-------------------+------------------+
| userId| testId| date| note|
+----------+-------------+-------------------+------------------+
| 123123123| 1|2019-01-22 02:03:00| aaa|
| 123123123| 1|2019-02-22 02:03:00| bbb|
| 123456789| 2|2019-03-23 02:03:00| ccc|
| 123456789| 2|2019-04-23 02:03:00| ddd|
| 321321321| 3|2019-05-23 02:03:00| eee|
+----------+-------------+-------------------+------------------+
Would like to get newest note (whole row) for each group userId
and testId
:
SELECT
n.userId,
n.testId,
n.date,
n.note
FROM
notes n
INNER JOIN (
SELECT
userId,
testId,
MAX(date) as maxDate
FROM
notes
GROUP BY
userId,
testId
) temp ON n.userId = temp.userId AND n.testId = temp.testId AND n.date = temp.maxDate
It works.
But now I'd like to also have previous note in each row:
+----------+-------------+-------------------+-------------+------------+
| userId| testId| date| note|previousNote|
+----------+-------------+-------------------+-------------+------------+
| 123123123| 1|2019-02-22 02:03:00| bbb| aaa|
| 123456789| 2|2019-04-23 02:03:00| ddd| ccc|
| 321321321| 3|2019-05-23 02:03:00| eee| null|
+----------+-------------+-------------------+-------------+------------+
Have no idea how to do it. I heard about LAG()
function which might be useful but found no good examples for my case.
I'd like to use it on dataframe in pyspark (if it's important)