1
SELECT t.tag_name
FROM tags t
 JOIN resource_tags rt ON rt.tag_id = t.tag_id
 JOIN resource r ON r.resource_id = rt.resource_id
 JOIN visitor_resource vr ON vr.resource_id = rt.resource_id
 JOIN visitor v ON v.visitor_id = vr.visitor_id
WHERE v.visitor_id = '1'
GROUP BY t.tag_name

As you might see, a 'visitor' visits a 'resource' and a relation between them will be created in visitor_resource.

A given 'resource' have a couple of 'tags' depending on the content, and they are bound together by a relation in resource_tags.

The above query outputs the tag-names for all the resources that a visitor have visited.

Now I want to find out how many times a tag is represented.

Consider the following: Resource 1: tag1, tag2 Resource 2: tag1 Resource 3: tag2 Resource 4: tag1

The query should output: tag1, 3 tag2, 2

I have tried with the following:

SELECT t.tag_name, SUM(t.tag_id) as cnt
 FROM tags t
 JOIN resource_tags rt ON rt.tag_id = t.tag_id
 JOIN visitor_resource vr ON vr.resource_id = rt.resource_id
 JOIN visitor v ON v.visitor_id = vr.visitor_id
WHERE v.visitor_id = '2'
GROUP BY t.tag_name

But that seems to have exceptionally unreasonable high numbers for cnt, and not counting for this specific user.

Is this even possible with MySQL?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Lars Hansen
  • 105
  • 1
  • 5

3 Answers3

1

You are doing a SUM on the ID, not a COUNT.

You're basically adding up the auto-generated values of your tags, which is going to give you a much higher result than simply adding up the occurrences.

Paul Alan Taylor
  • 10,474
  • 1
  • 26
  • 42
  • Dude, mistakes like this happen to all of us. I've lost count of the times when I've done the exact opposite boolean operation to what I intended :) – Paul Alan Taylor Apr 21 '11 at 14:04
1

It appears your problem might because you are doing SUM instead of COUNT

SELECT t.tag_name, COUNT(t.tag_id) as cnt

instead of

SELECT t.tag_name, SUM(t.tag_id) as cnt

should do it.

Naraen
  • 3,240
  • 2
  • 22
  • 20
0

Use: COUNT(t.tag_id)

and not SUM(t.tag_id).

It's also more common to group by the id (primary key) and perhaps faster as the index can be used for the grouping. The tag_name may not have an index.

SELECT t.tag_name
     , COUNT(t.tag_id) AS cnt
FROM tags t
  JOIN resource_tags rt ON rt.tag_id = t.tag_id
  JOIN visitor_resource vr ON vr.resource_id = rt.resource_id
  JOIN visitor v ON v.visitor_id = vr.visitor_id
WHERE v.visitor_id = '2'
GROUP BY t.tag_id
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235