I want to pick latest 2
records based on the timestamp column, let's say to perform analysis.
In reality i have huge dataset where i want to pick n
records for each of my user_n.
If i could hardcode the value, it would have been okay.
SELECT *
FROM
`project-id`.huge-dataset-id.streaming-data`
WHERE timestamp_column BETWEEN '2015-06-10 14:20' AND '2015-06-10 14:30'
But it has to be formulated dynamic.I am not so strong in SQL and BigQyery.Please help.
I need Latest N
Records
Tried - (Working Solution), Need more optimised Solution.
SELECT *
FROM (
SELECT *
FROM `project-id`.`dataset`.`streaming_data` s1
WHERE (
SELECT COUNT(*)
FROM `project-id`.`dataset`.`streaming_data` s2
WHERE s1.user_n = s2.user_n
AND s1.timestamp_col <= s2.timestamp_col
) <= 2
)
Since, there are too many nested query, the operation is really slow. Can someone provide an optimised solution.
Data Table as streaming_data
-------------------------------
| user_n | timestamp_column |
|-----------------------------|
| ABC | 10-Jun-12 14.30 |
| DEF | 10-Jun-12 14.30 |
| ABC | 10-Jun-12 14.20 |
| DEF | 10-Jun-12 14.20 |
| ABC | 10-Jun-12 14.10 |
| DEF | 10-Jun-12 14.10 |
| ABC | 10-Jun-12 14.00 |
| DEF | 10-Jun-12 14.00 |
Expected Output : Latest 2 Records for all user, based on the latest timestamp
-------------------------------
| user_n | timestamp_column |
|-----------------------------|
| ABC | 10-Jun-12 14.30 |
| DEF | 10-Jun-12 14.30 |
| ABC | 10-Jun-12 14.20 |
| DEF | 10-Jun-12 14.20 |