-2

I'd like to get the count of users grouped by the number of comments they've made.

[User]: ID
[Comment]: ID, UserID

So if user A has made 1 comment, user B has made 1 comment and user C has made 2 comments, then the output would be:

0 comments => 0 users
1 comment  => 2 users (A+B)
2 comments => 1 user  (C)

How would you query this?

randomguy
  • 12,042
  • 16
  • 71
  • 101
  • 2
    You are expected to provide the relevant table layout with the question, and what you have tried (even if it's not working). You have been around for long enough and should know the basics. – Erwin Brandstetter Aug 24 '14 at 01:09
  • Hi Erwin! Actually, I wasn't aware of that rule, so glad you made the comment. I assumed the one-to-many association is so elementary and standard, that defining table layouts wouldn't be necessary. Anyhow, thanks for the feedback! :) – randomguy Aug 24 '14 at 10:41
  • 2
    There are many subtle variants. NOT NULL, UNIQUE, PRIMARY KEY constraints, data types, cardinalities, width of the row, indexes, frequencies of values, ... They all matter for the best solution. It's best to provide the (relevant parts of) your actual table layouts you get in psql with `\d tbl`. Better yet, provide an [SSCCE](http://www.sscce.org/) in an [sql fiddle](http://sqlfiddle.com/#!11/53c1c/3) (random example). – Erwin Brandstetter Aug 24 '14 at 14:18

2 Answers2

3

It will depend on your specific database structure, but let's say you have a users table and a comments table:

users table:
id: serial
name: text

comments table:
id: serial
user_id: integer (foreign key to the users table)
comment: text

You can count the number of comments each user has made with this query:

  SELECT users.id, users.name, count(comments.id) as comment_cnt
    FROM users LEFT JOIN
         comments ON users.id = comments.user_id
GROUP BY users.id, users.name

You can then use the results of this query in a nested query to count the number of users for each number of comments:

  SELECT comment_cnt, count(*) FROM
  (SELECT users.id, users.name, count(comments.id) as comment_cnt
    FROM users LEFT JOIN
         comments ON users.id = comments.user_id
GROUP BY users.id, users.name) AS comment_cnts
GROUP BY comment_cnt;

I don't know of any elegant way to fill the gaps where there are zero users for a given number of comments, but a temporary table and another level of nesting works:

CREATE TABLE wholenumbers (num integer);

INSERT INTO wholenumbers VALUES (0), (1), (2), (3), (4), (5), (6);

   SELECT num as comment_cnt, COALESCE(user_cnt,0) as user_cnt
     FROM wholenumbers
LEFT JOIN (SELECT comment_cnt, count(*) AS user_cnt
             FROM (  SELECT users.id, users.name, count(comments.id) AS comment_cnt
                       FROM users LEFT JOIN comments ON users.id = comments.user_id
                   GROUP BY users.id, users.name) AS comment_cnts
         GROUP BY comment_cnt) AS user_cnts ON wholenumbers.num = user_cnts.comment_cnt
ORDER BY num;
ClaytonC
  • 475
  • 4
  • 8
  • 1
    Awesome, it works! Small additional request: how would you go about filling the count gaps where there are no users for certain amount of comments (ie. `0 => 20 users, 1 => 0 users [we want this], 2 => 5 users`)? – randomguy Aug 23 '14 at 23:28
  • 1
    I don't know of any elegant way to do that. You could manually create a temporary table with a single integer column and n rows, each with one whole number in it 1, 2, 3, 4 ... however many you like. You can then add another level of nesting. I'll add the revised query to my answer. – ClaytonC Aug 23 '14 at 23:52
  • 2
    Thanks @ErwinBrandstetter. I didn't know about that. Does your solution show the number of users who have 0 comments? – ClaytonC Aug 24 '14 at 01:53
  • 2
    @ClaytonC: Good point, thanks. Now it does. That makes more sense than starting with the smallest count. – Erwin Brandstetter Aug 24 '14 at 01:56
  • 1
    Thanks again @ErwinBrandstetter. Looks like you also need to change the WITH query to reference the users table, otherwise there is no way to know how many users there are without any comments. As it stands, your solution does not reference the users table at all. – ClaytonC Aug 24 '14 at 02:02
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/59869/discussion-between-claytonc-and-erwin-brandstetter). – ClaytonC Aug 24 '14 at 02:06
  • 1
    @ClaytonC: Considerably faster this way. But you are right, that count for users with 0 comments was missing, too. I added a bit to my answer to address that. – Erwin Brandstetter Aug 24 '14 at 02:42
2

Building on the table layout @ClaytonC provided:

WITH cte AS (
   SELECT msg_ct, count(*) AS users
   FROM  (
      SELECT count(*) AS msg_ct
      FROM   comments 
      GROUP  BY user_id
      ) sub
   GROUP  BY 1
   )
SELECT msg_ct, COALESCE(users, 0) AS users
FROM   generate_series(0, (SELECT max(msg_ct) FROM cte)) msg_ct
LEFT   JOIN cte USING (msg_ct)
ORDER  BY 1;

Major points

  • First, count comments per user (msg_ct). As long as referential integrity is enforced by a foreign key, you do not need to join to the users table at all to aggregate comments per user. Just count rows in comments.
    Next, count users per message count (users).

  • I am doing this in a CTE, because I use the derived table twice in the final query.
    First for generate_series() to generate all counts from min to max dynamically, including gaps.
    Then for the table to LEFT JOIN to and get the final result.

  • The count starts with 0 (after my update). If you want to have it start with the smallest actual msg_ct, consider the first draft of my answer in the edit history.

  • Closely related answer explaining the basics:

Count users without comments

As @ClaytonC commented, the above answer does not include users without comments.

To fix this (if you actually need it), either LEFT JOIN to users right at the start after all:

WITH cte AS (
   SELECT msg_ct, count(*) AS users
   FROM  (
      SELECT count(c.user_id) AS msg_ct
      FROM   users u
      LEFT   JOIN comments c ON c.user_id = u.id
      GROUP  BY u.id
      ) sub
   GROUP  BY 1
   )
SELECT ...

Or, since the join is just for finding users without comments, we might get away cheaper: Count all users and subtract users with comments (which we processed anyway):

WITH cte AS (
   SELECT msg_ct, count(*)::int AS users
   FROM  (
      SELECT count(*)::int AS msg_ct
      FROM   comments 
      GROUP  BY user_id
      ) sub
   GROUP  BY 1
   )
, agg AS (
   SELECT max(msg_ct)   AS max_ct      -- maximum for generate_series
         ,((SELECT count(*) FROM users) - sum(users))::int AS users
                                       -- quiet rest with 0 comments
   FROM cte
   )
SELECT 0 AS msg_ct, users FROM agg     -- users with 0 comments
UNION  ALL
SELECT msg_ct, COALESCE(users, 0)
FROM  (SELECT generate_series(1, max_ct) AS msg_ct FROM agg) g
LEFT   JOIN cte USING (msg_ct)
ORDER  BY 1;

The query gets a bit more complex, but it might be faster for big tables. Not sure. Test with EXPLAIN ANALYZE (I would be grateful for a comment with the results.)

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228