0

I am currently working on a system to allow documents to be tagged with keywords and have a table of the form:

CREATE TABLE `KeywordsToDocuments` (
    `keywordID` int NOT NULL,
    `documentID` int NOT NULL);

It is possible for each document to be associated with many keywords, and for each keyword to be associated with many documents. Unfortunately a simple AND statement does not achieve what I need as no single row will satisfy multiple values of KeywordID, instead a publication is associated with several keywords by inclusion of its ID on several rows with different keywordIDs.

I am attempting to write a query which will give me a list of documents that are tagged with all of an arbitrary number of keywords. My initial attempt was the following which uses the INTERSECT statement:

SELECT documentID FROM KeywordsToDocuments WHERE KeywordID=keyword1
INTERSECT
SELECT documentID FROM KeywordsToDocuments WHERE KeywordID=keyword2
...

Where the ... can be an arbitrary number of similar statements for further keywords. This query is built by a PHP script dynamically.

What I am trying to do here is to find all of the documents that have an association with each keyword individually, then from those result sets find the intersect which will be documents which are associated with all off the requested keywords.

However I just learned that MySQL does not support the INTERSECT statement. I have been looking for alternatives but all the resources I have found on replacing INTERSECT have focussed on it being used to merge results from two different tables. I cannot see a way in this case to convert to another statement such as an INNER JOIN.

How can make a query like this work without using the INTERSECT statement to permit compatibility with MySQL?

Vality
  • 6,577
  • 3
  • 27
  • 48
  • A join should imho indeed be the right way. What joins did you already try that did not lead to your expected result? – ArSeN Jan 01 '16 at 20:43
  • @ArSeN I have not tried a join yet as I am not sure how to accomplish the needed semantics using a join. That is the essence of my problem – Vality Jan 01 '16 at 20:44
  • @splash58 That will look for a single row is equal to both no? That will never happen. – Vality Jan 01 '16 at 20:45
  • Well, do you exactly know how joins do work? If not, you should probably read up on them. If a join is in fact the wrong choice, then I did not fully understand your problem. Could you please elaborate on your table structure and what result you do exactly want to query? – ArSeN Jan 01 '16 at 20:46
  • @Vality i reread your question. i was wrong – splash58 Jan 01 '16 at 20:48

1 Answers1

2

I like to approach these queries using aggregation and a having clause:

SELECT documentID
FROM KeywordsToDocuments
WHERE KeywordID IN (keyword1, keyword2)
GROUP BY documentID
HAVING COUNT(*) = 2;

This is standard SQL and should work in any database. In addition, by modifying the HAVING clause, you have a lot of flexibility on the logic of combinations of keywords you are looking for.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • That is wonderful, I had not thought about using a HAVING. Thank you, indeed that looks substantially neater than my original code including the `INTERSECT`. I shall accept in a few moments. – Vality Jan 01 '16 at 20:53