-1

I am trying to write the SQL to generate the count of unique users who have purchased product B but have never purchased product C.

user_id product date_purchased
1 A 2015-01-10 00:00:00.000
1 B 2014-11-23 00:00:00.000
1 C 2015-05-01 00:00:00.000
2 A 2014-10-01 00:00:00.000
2 C 2014-12-23 00:00:00.000
3 B 2015-02-15 00:00:00.000
3 D 2014-09-23 00:00:00.000
3 E 2014-06-01 00:00:00.000
4 E 2014-12-14 00:00:00.000
4 F 2015-03-03 00:00:00.000

Someone suggested trying the following

SELECT COUNT(*) AS bought_b_no_c
FROM (
  SELECT user_id
  FROM table_purchases
  WHERE product IN ('B', 'C')
  GROUP BY user_id
  HAVING SUM(product = 'C') = 0
) t;

However, SQL Server is saying I have an error as follows:

Msg 102, Level 15, State 1, Line 5 Incorrect syntax near ')'.

Anybody know why this is happening and how to learn exactly how this work?

PS This was my original code

SELECT Count(*) AS bought_b_no_c
FROM   (SELECT user_id,
               Sum(bought_b_no_c) AS boolean_b_no_c
        FROM   (SELECT user_id,
                       product,
                       CASE
                         WHEN product = 'B' THEN 1
                         WHEN product = 'C' THEN -1
                         ELSE 0
                       END AS bought_b_no_c
                FROM   table_purchases) AS T
        GROUP  BY user_id) AS J
WHERE  boolean_b_no_c = 1
Dale K
  • 25,246
  • 15
  • 42
  • 71
  • 4
    `SUM(product = 'C')` should be `SUM(case when product = 'C' then 1 else 0 end)` – Dale K Dec 28 '21 at 02:36
  • 2
    To avoid that, it would have helped if you tagged SQL Server in [your original question](https://stackoverflow.com/questions/70483247/is-there-a-better-method-instead-of-2-nested-subqueries-for-this-task) from the start, if you're actually using SQL Server and not MySQL. – sticky bit Dec 28 '21 at 02:58
  • 1
    Curious that you include your "original code", but don't mention any problems with it. What will it do if one user purchases two B's? Or three B's and two C's? You might want to have a look at [`not exists`](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/exists-transact-sql?view=sql-server-ver15). – HABO Dec 28 '21 at 02:58
  • Aside: As a rule [`exists`](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/exists-transact-sql?view=sql-server-ver15) should be used when you only want to compare a `count` (or equivalent `sum`) to `0`, i.e. you don't really care about the precise value, just that some rows of interest exist (or don't). The _query optimizer_ ought to generate a more efficient _execution plan_. – HABO Dec 28 '21 at 20:03

2 Answers2

0

You could use this:

SELECT COUNT(b.*) AS bought_b_no_c
FROM table_purchases b
WHERE b.product = 'B'
AND b.user_id NOT IN (
  SELECT c.user_id
  FROM table_purchases c
  WHERE c.product = 'C'
)

This one "reads" more natural to me.


The other solution someone suggested can work with the fix from Dale K:

SELECT COUNT(*) AS bought_b_no_c
FROM (
  SELECT user_id
  FROM table_purchases
  WHERE product IN ('B', 'C')
  GROUP BY user_id
  HAVING SUM(case when product = 'C' then 1 else 0 end) = 0
) t;

I'm not sure which one has better performance, but both should do the trick.

Alisson Reinaldo Silva
  • 10,009
  • 5
  • 65
  • 83
0

Here is a way using NOT EXISTS. I translated the question "get unique users who use product B and has not purchased product C"

select count(distinct x.user) /*gets unique users*/
  from table x
 where x.product='B' /*who purchased product B*/
   and not exists(select null
                    from table y
                   where y.user=x.user
                     and x.product = 'C' /*and the user has not purchased C */)
George Joseph
  • 5,842
  • 10
  • 24