MYSQL group by and inner join

Question

I have an article table which holds the number of articles views for each day. A new record is created to hold the count for each seperate day for each article.

The query below gets the article id and total views for the top 5 viewed article id for all time :

SELECT article_id, 
SUM(article_count) as cnt
FROM article_views
GROUP BY article_id
ORDER BY cnt DESC
LIMIT 5

I also have a seperate article table which holds all the article fields. I want to ammend the query above to join to the article table and get two fields for each article id. I have tried to do this below but count is comming back incorrectly :

SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5

Im not sure exactly what im doing wrong. Do I need to do a subquery?

Mahmoud Gamal · Accepted Answer · 2013-01-24T16:17:58.347

15

Add articles.article_title, articles.artcile_url to the GROUP BY clause:

SELECT 
  article_views.article_id, 
  articles.article_title, 
  articles.artcile_url,
  SUM( article_views.article_count ) AS cnt
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id,   
         articles.article_title, 
         articles.artcile_url
ORDER BY cnt DESC
LIMIT 5;

The reason you were not getting correct result set, is that when you select rows that are not included in the GROUP BY nor in an aggregate function in the SELECT clause MySQL picks up random value.

edited Jan 24 '13 at 16:17

answered Jan 24 '13 at 16:06

Mahmoud Gamal

78,257
17
139
164

1

I'm impressed that you seem to know much more about the data structure than is presented in the question. Just one point. MySQL does not pick a *random* value. It picks an *arbitrary* value. *Random* would suggest that any value could be chosen. In fact, it is almost always the values in the first record -- but you cannot depend on that fact. – Gordon Linoff Jan 24 '13 at 16:23
If there happen to be rows in the `article` table that match on `article_id`,`article_title` and `article_url`, (i.e. absent a guarantee that the combination of these three is unique), then the counts returned by this query are still subject to "being off", as compared to the original query. (It does leave one to wonder why there would be any duplicate values in a column named "`article_id`" in the "`article`" table.) – spencer7593 Jan 25 '13 at 00:50

score 3 · Answer 2 · answered Jan 24 '13 at 16:14

You are using a MySQL (mis) feature called Hidden Columns, because article title is not in the group by. However, this may or may not be causing your problem.

If the counts are wrong, then I think you have duplicate article_id in the article table. You can check this by doing:

select article_id, count(*) as cnt
from articles
group by article_id
having cnt > 1

If any appear, then that is your problem. If they all have different titles, then grouping by the title (as suggested by Mahmoud) would fix the problem.

If not, one way to fix it is the following:

SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views INNER JOIN
     (select a.* from articles group by article_id) articles
     ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5

This chooses an abitrary title for the article.

With this approach, the `cnt` value will not be inflated due to duplicate `article_id` values that occur in the `articles` table; it would seem better to insulate the query in this way. — spencer7593, Jan 25 '13 at 00:53

spencer7593 · Answer 3 · 2013-01-24T16:20:37.440

Your query looks basically right to me...

But the value returned for cnt is going to be dependent upon article_id column being UNIQUE in the articles table. We'd assume that it's the primary key, and absent a schema definition, that's only an assumption.)

Also, we're likely to assume there's a foreign key between the tables, that is, there are no values of article_id in the articles_view table which don't match a value of article_id on a row from the articles table.

To check for "orphan" article_id values, run a query like:

SELECT v.article_id
  FROM articles_view v
  LEFT
  JOIN articles a
    ON a.article_id = v.article_id
 WHERE a.article_id IS NULL

To check for "duplicate" article_id values in articles, run a query like:

SELECT a.article_id
  FROM articles a
 GROUP BY a.article_id
HAVING COUNT(1) > 1

If either of those queries returns rows, that could be an explanation for the behavior you observe.

MYSQL group by and inner join

3 Answers3