SQL: how to get sum of grouped items for a given corhort

Question

I have the following orders table:

(1, 2, '2021-03-05', 15, 'books'),
(1, 13, '2022-03-07', 3, 'music'),
(1, 14, '2022-06-15', 900, 'travel'),
(1, 11, '2021-11-17', 25, 'books'),
(1, 16, '2022-08-03', 32, 'books'),
(2, 4, '2021-04-12', 4, 'music'),
(2, 7, '2021-06-29', 9, 'music'),
(2, 20, '2022-11-03', 8, 'music'),
(2, 22, '2022-11-07', 575, 'food'),
(2, 24, '2022-11-20', 95, 'food'),
(3, 3, '2021-03-17', 25, 'books'),
(3, 5, '2021-06-01', 650, 'travel'),
(3, 17, '2022-08-17', 1200, 'travel'),
(3, 19, '2022-10-02', 6, 'music'),
(3, 23, '2022-11-08', 7, 'food'),
(4, 9, '2021-08-20', 3200, 'travel'),
(4, 10, '2021-10-29', 2750, 'travel'),
(4, 15, '2022-07-15', 1820, 'travel'),
(4, 21, '2022-11-05', 8000, 'travel'),
(4, 25, '2022-11-29', 2300, 'travel'),
(5, 1, '2021-01-04', 3, 'music'),
(5, 6, '2021-06-09', 820, 'travel'),
(5, 8, '2021-07-30', 19, 'books'),
(5, 12, '2021-12-10', 22, 'music'),
(5, 18, '2022-09-19', 20, 'books'),
(6, 26, '2023-01-09', 700, 'travel'),
(6, 27, '2023-01-23', 1900, 'travel')

Here's a Fiddle: http://sqlfiddle.com/#!17/71698/3

I would like to get the sum of revenue by product among those customers who have ever purchased a travel product.

In this case, customers 1, 3, 4, 5, and 6 have purchased the travel product. Therefore, the desired result set would look like this:

customer_id	revenue_books	revenue_music	revenue_food
1	72	3	0
3	25	6	7
4	0	0	0
5	39	25	0
6	0	0	0

How would I do this? Thank you!

Horaciux · Answer 1 · 2023-02-01T22:12:39.543

SELECT
  user_id
  ,sum(case when product='books' then revenue else 0 end) as revenue_books    
  ,sum(case when product='music' then revenue else 0 end) as revenue_music    
  ,sum(case when product='food' then revenue else 0 end) as revenue_food    
  FROM
  orders
  where user_id in (select user_id from orders where product='travel')
  group by user_id

http://sqlfiddle.com/#!17/71698/5

EDIT

As suggested, this is another option. But it shows null instead of zero. If zero is needed you should use coalesce()

SELECT
  user_id
  ,coalesce(sum(revenue) filter (where product = 'books'),0) as revenue_books    
  ,coalesce(sum(revenue) filter (where product = 'music'),0)  as revenue_music    
  ,coalesce(sum(revenue) filter (where product = 'food'),0)  as revenue_food    
  FROM  orders
  where user_id in (select user_id from orders where product='travel')
  group by user_id

http://sqlfiddle.com/#!17/71698/7

score 1 · Answer 2 · answered Feb 01 '23 at 21:15

1

I my answer I show how to think about the problem to get the result -- break it down to parts and then combine it. Some answer give a less verbose query, but I don't think they will be faster. This should be easier to understand for someone new to SQL

First the people who have purchased a travel product

SELECT DISTINCT user_id
FROM orders 
WHERE product = 'travel'

You care about books, music and food, you can get the totals for those like this:

SELECT user_id, product, SUM(revenue) as TOT
FROM orders
GROUP BY user_id, product 
WHERE product in ('books', 'music', 'food'),

Now join them together

WITH sums AS (
  SELECT user_id, product, SUM(revenue) as TOT
  FROM orders
  GROUP BY user_id, product 
  WHERE product in ('books', 'music', 'food'),
)
SELECT u.user_id, books.TOT as book_total, music.TOT as music_total, food.TOT as food_total
FROM (
  SELECT DISTINCT user_id
  FROM orders 
  WHERE product = 'travel'
) as U
LEFT JOIN sums as books ON u.userid = books.user_id and books.product = 'books'
LEFT JOIN sums as music ON u.userid = music.user_id and music.product = 'music'
LEFT JOIN sums as food ON u.userid = food.user_id and food.product = 'food'

answered Feb 01 '23 at 21:15

Hogan

69,564
10
76
117

1

Using conditional aggregation with a `filter()` or `CASE` expression is much more efficient. – Feb 01 '23 at 21:35
I believe you @a_horse_with_no_name about filter() but not sure about the CASE -- wouldn't that be dependent on the data -- for example a table with 10 mill records of which only 1% were books, music, and food would be faster with an approach like mine -- right? – Hogan Feb 01 '23 at 22:26
A CASE expression has about the same performance as the equivalent `filter()` exxpression – Feb 01 '23 at 22:34
It is a shame I'm not Quassnoi I could go out and build a test case and see for sure. I'm not, so sadly we'll never know for sure. – Hogan Feb 01 '23 at 22:42
2

Well, that's not so complicated to setup: https://dbfiddle.uk/wZRsZxUO Here are the plans with a table with 5 million rows: https://explain.depesz.com/s/vF44 and https://explain.depesz.com/s/Lm22 – Feb 01 '23 at 22:58
@a_horse_with_no_name - very cool. I put mine in. It was slower than both of yours with 1e5 and faster than both of yours with 1e6. My guess was right -- at some point this is faster. -> https://dbfiddle.uk/YQokF-qY – Hogan Feb 02 '23 at 22:18

SQL: how to get sum of grouped items for a given corhort

2 Answers2