Optimize a SQL query for tag matching

Question

Example dataset:

id | tag
---|------
1  | car
1  | bike
2  | boat
2  | bike
3  | plane
3  | car

id and tag are both indexed.

I am trying to get the id who matches the tags [car, bike] (the number of tags can vary).

A naive query to do so would be:

SELECT id
FROM test
WHERE tag = 'car'
    OR tag =  'bike'
GROUP BY id
HAVING COUNT(*) = 2

However, doing so is quite inefficient because of the group by and the fact that any line that match one tag is taken into account for the group by (and I have a large volumetry).

Is there a more efficient query for this situation?

The only solution I see would be to have another table containing something like:

id | hash
---|------
1  | car,bike
2  | boat,bike
3  | plane,car

But this is not an easy solution to implement and maintain up to date.

Additional infos:

the name matching must be exact (no fulltext index)
the number of tags is not always 2

Good presentation of your question. With [SQLFiddle](http://sqlfiddle.com) example it would be perfect :) — juergen d, Oct 08 '12 at 13:35
I'd start by normalizing your tags. You should have a Tags table with ID and Name. Then your dataset above would be id, TagID — Tobsey, Oct 08 '12 at 13:35
so in this case the resul would be car and bike because they both have 2 rows with the name? — Diego, Oct 08 '12 at 13:38
Your concerns are misplaced. Your "naive" query is just fine. An index on (tag, id) should give very good performance for this query, since it can be satisfied only using the index. — Gordon Linoff, Oct 08 '12 at 13:42
@Tobsey Well actually they are all ids, but I wanted to simplify the question as much as possible and make it understandable quickly — Matthieu Napoli, Oct 08 '12 at 13:46
Do you currently have a performance problem with your query? Is there a unique constraint on (id, tag)? — Tim Lehner, Oct 08 '12 at 14:20
so try my query. If I understood corectly it will retur what you expect — Diego, Oct 08 '12 at 14:21

score 0 · Answer 1 · answered Oct 08 '12 at 13:35

0

try this:

SELECT id
FROM test
WHERE tag in('car','bike')
GROUP BY id
HAVING COUNT(*) = 2

And create a nonclustered index on tag column

answered Oct 08 '12 at 13:35

AnandPhadke

13,160
5
26
33

2

`IN` is a synonym for `OR`. This makes no difference. – podiluska Oct 08 '12 at 13:39
1

Practically IN is faster than OR – AnandPhadke Oct 08 '12 at 13:41
I didn't think of `IN` indeed, do any of you have a link to support either possibility (faster or not)? – Matthieu Napoli Oct 08 '12 at 13:47
check this link:http://stackoverflow.com/questions/1013797/is-sql-in-bad-for-performance. Here read all the answers and I also suggest using temp table in case of huge number of arguments in IN clause.But here I dont think you will have huge number of tags. – AnandPhadke Oct 08 '12 at 13:58

score 0 · Answer 2 · answered Oct 08 '12 at 14:13

0

Here you go:

select id from TEST where tag = 'car' and ID in (select id from TEST where tag='bike')

answered Oct 08 '12 at 14:13

Romo

222
4
11

From the OP: "the number of tags is not always 2" – Tim Lehner Oct 08 '12 at 14:18
Yes, but you can extend the query with more "ID in". You already have to make some kind of "knowing how many things to search for" in the query. And in this example you can use the clean index without any group by and having count. – Romo Oct 08 '12 at 14:36
Okay to vote it down, but please analyse the query against the other examples in the answers. You will se it is way faster. And no matter how you do it, you still have to "build" the query on how many "tag" (car, bike an so on) there are in the query. – Romo Oct 09 '12 at 08:20

score -1 · Answer 3 · answered Oct 08 '12 at 13:44

-1

not sure if I get you, but try this:

select tag, count(*)  as amount
into #temp
from MYTABLE
group by tag


select t1.tag 
from #temp t1 join #temp t2 on t1.amount=t2.amount and t1.tag=t2.tag and t1.amount=2

should result bike and car since they both have 2 rows, whihc is equal to 2

answered Oct 08 '12 at 13:44

Diego

34,802
21
91
134

I am trying to optimize the query, your way of doing doesn't seem more efficient? – Matthieu Napoli Oct 08 '12 at 21:25
I think its worth giving it a try and compare the plans. – Diego Oct 09 '12 at 08:28

Optimize a SQL query for tag matching

3 Answers3