Let's say i have a table with Users :
user_id |
---|
abc |
def |
And a table of Purchases :
purchase_id | purchase_date | status | user_id |
---|---|---|---|
1 | 2020-01-01 | sold | abc |
2 | 2020-02-01 | refunded | abc |
3 | 2020-03-01 | sold | def |
4 | 2020-04-01 | sold | def |
5 | 2020-05-01 | sold | def |
I want the status of the last purchase for each user, which would be :
user_id | last_purchase_date | status |
---|---|---|
abc | 2020-02-01 | refunded |
def | 2020-05-01 | sold |
I am wondering what is the best approach (in term of performance, cost, readability ...) between those three queries that give the same results :
Aggregated Function
SELECT
user_id,
MAX(purchase_date) as last_purchase_date,
ARRAY_AGG(status ORDER BY purchase_date DESC LIMIT 1)[SAFE_OFFSET(0)] as last_status
FROM user
LEFT JOIN purchase USING (user_id)
GROUP BY user_id
Analytics Function
SELECT
DISTINCT
user_id,
MAX(purchase_date) OVER (PARTITION BY user_id) as last_purchase_date,
LAST_VALUE(status) OVER (PARTITION BY user_id ORDER BY purchase_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_status,
FROM user
LEFT JOIN purchase USING (user_id)
Subquery
SELECT
user_id,
purchase_date as last_purchase_date,
status as last_status
FROM user
LEFT JOIN purchase USING (user_id)
WHERE purchase_date IN (
SELECT
MAX(purchase_date) as purchase_date
FROM purchase
GROUP BY user_id
)
Here is the Dataset for those who want it :
WITH purchase as (
SELECT 1 as purchase_id, "2020-01-01" as purchase_date, "sold" as status, "abc" as user_id
UNION ALL SELECT 2 as purchase_id, "2020-02-01" as purchase_date, "refunded" as status, "abc" as user_id
UNION ALL SELECT 3 as purchase_id, "2020-03-01" as purchase_date, "sold" as status, "def" as user_id
UNION ALL SELECT 4 as purchase_id, "2020-04-01" as purchase_date, "sold" as status, "def" as user_id
UNION ALL SELECT 5 as purchase_id, "2020-05-01" as purchase_date, "sold" as status, "def" as user_id
), user as (
SELECT "abc" as user_id,
UNION ALL SELECT "def" as user_id,
)