3

Suppose I have a table like this,

id tagId
1 1
1 2
1 5
2 1
2 5
3 2
3 4
3 5
3 8

I want to select id's where tagId includes both 2 and 5. For this fake data set, It should return 1 and 3.

I tried,

select id from [dbo].[mytable] where tagId IN(2,5)

But it takes 2 and 5 into account respectively. I also did not want to keep my table in wide format since tagId is dynamic. It can reach any number of columns. I also considered filtering with two different queries to find (somehow) the intersection. However since I may search more than two values inside the tagId in real life, it sounds inefficient to me.

I am sure that this is something faced before when tag searching. What do you suggest? Changing table format?

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
maydin
  • 3,715
  • 3
  • 10
  • 27

2 Answers2

3

One option is to count the number of distinct tagIds (from the ones you're looking for) each id has:

SELECT   id
FROM     [dbo].[mytable] 
WHERE    tagId IN (2,5)
GROUP BY id
HAVING   COUNT(DISTINCT tagId) = 2
Mureinik
  • 297,002
  • 52
  • 306
  • 350
  • Clever solution. Little typo: `DISTINCT`. – Zakaria Dec 24 '21 at 21:14
  • The number 2 at the last step represents _at least_ condition for the number of the tags I am searching. Right? – maydin Dec 24 '21 at 21:16
  • @Zaki indeed a typo, thanks for noticing. Edited and fixed – Mureinik Dec 24 '21 at 21:18
  • 1
    @maydin The `where` clause limits the return rows to tagIds of 2 and 5, and then the `having` clause makes sure they are both there. If an ID has additional tags it won't interfere with this query – Mureinik Dec 24 '21 at 21:19
2

This is actually a Relational Division With Remainder question.

First, you have to place your input into proper table format. I suggest you use a Table Valued Parameter if executing from client code. You can also use a temp table or table variable.

DECLARE @ids TABLE (tagId int PRIMARY KEY);
INSERT @ids VALUES (2), (5);

There are a number of different solutions to this type of question.

  1. Classic double-negative EXISTS

    SELECT DISTINCT
      mt.Id
    FROM mytable mt
    WHERE NOT EXISTS (SELECT 1
        FROM @ids i
        WHERE NOT EXISTS (SELECT 1
            FROM mytable mt2
            WHERE mt2.id = mt.id
              AND mt2.tagId = i.tagId)
    );
    

    This is not usually efficient though

  2. Comparing to the total number of IDs to match

    SELECT mt.id
    FROM mytable mt
    JOIN @ids i ON i.tagId = mt.tagId
    GROUP BY mt.id
    HAVING COUNT(*) = (SELECT COUNT(*) FROM @ids);
    

    This is much more efficient. You can also do this using a window function, it may be more or less efficient, YMMV.

    SELECT mt.Id
    FROM mytable mt
    JOIN (
        SELECT *,
          total = COUNT(*) OVER ()
        FROM @ids i
    ) i ON i.tagId = mt.tagId
    GROUP BY mt.id
    HAVING COUNT(*) = MIN(i.total);
    
  3. Another solution involves cross-joining everything and checking how many matches there are using conditional aggregation

    SELECT mt.id
    FROM (
        SELECT
          mt.id,
          mt.tagId,
          matches = SUM(CASE WHEN i.tagId = mt.tagId THEN 1 END),
          total = COUNT(*)
        FROM mytable mt
        CROSS JOIN @ids i
        GROUP BY
          mt.id,
          mt.tagId
    ) mt
    GROUP BY mt.id
    HAVING SUM(matches) = MIN(total)
       AND MIN(matches) >= 0;
    

db<>fiddle

There are other solutions also, see High Performance Relational Division in SQL Server

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • This is very informative. Thank you @Charlieface . I wish I have a chance to accept two answers together. But upvoted! – maydin Dec 25 '21 at 22:56