Get Max Count from Group by

Question

I have trouble to get a output for group function in sql.Below are the details for table

I have 1 table table name "checks" have 2 columns pid,cid

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PID                                                VARCHAR2(20)
 CID                                                VARCHAR2(20)

Below are rows available

select * from checks;

PID                  CID
-------------------- --------------------
p1                   c1
p1                   c1
p1                   c2
p1                   c2
p1                   c2
p2                   c1
p2                   c1
p2                   c1
p2                   c1
p2                   c1
p2                   c1
p2                   c2
p2                   c2
p2                   c2
p2                   c2
p2                   c2

P represents participants and c represents category

question

I need to know which participant participate more than one category in that which category participant participate maximum.(for every participants)

Expected result:

pid   cid    count(cid)
---   ---    -----------
p1    c2         3
p2    c1         6

I'm not sure I understand the question fully yet - why does the `p1, c2` row exist in the result? - `p2` has a higher count for `c2` rows. — Damien_The_Unbeliever, Jul 18 '13 at 06:16

Damien_The_Unbeliever · Answer 1 · 2013-07-18T08:27:19.087

5

Assuming a database system (you haven't specified one, but I suspect Oracle?) that supports windowing functions and CTEs, I'd write:

;With Groups as (
    select pid,cid,COUNT(*) as cnt from checks group by pid,cid
), Ordered as (
    select pid,cid,cnt,
       ROW_NUMBER() OVER (PARTITION BY pid ORDER BY cnt desc) as rn,
       COUNT(*) OVER (PARTITION BY pid) as multi
    from Groups
)
select pid,cid,cnt
from Ordered
where rn = 1 and multi > 1

The first CTE (Groups) just finds the counts for each unique cid,pid combination. The second CTE (Ordered) assigns row numbers to these results based on the count - with the highest count assigned a row number of 1. We also count how many total rows have been produced for each pid.

Finally, we select those rows which were assigned a row number of 1 (the highest count) and for which we obtained multiple results for the same pid.

Here's an Oracle fiddle to play with. And here's an SQL Server version (and thanks to Andriy M for producing the Oracle one)

edited Jul 18 '13 at 08:27

answered Jul 18 '13 at 06:26

Damien_The_Unbeliever

234,701
27
340
448

shouldn't it be `PARTITION BY pid` ? – Noel Jul 18 '13 at 06:29
@Ramblin'Man - corrected it (paritally) during the edit and just corrected the other. Thanks. – Damien_The_Unbeliever Jul 18 '13 at 06:30
I wasn't sure whether to edit your answer or offer my Oracle demo in a comment, then decided to do the former. However, even though I agree that Oracle is likely the product the OP is using, we both may be mistaken. So I don't know if my edit was a good idea. Please feel free to change it as you see fit. Maybe, leaving the SQL Server demo available would be fine too, but I'm really not sure (since it certainly doesn't have a `VARCHAR2` type). Sorry if I'm causing any trouble. – Andriy M Jul 18 '13 at 08:21

score 2 · Answer 2 · answered Jul 18 '13 at 09:17

Step by step:

First, get the counts of rows per (PID, CID). This is simple:

SELECT
  PID,
  CID,
  COUNT(*) AS cnt
FROM checks
GROUP BY
  PID,
  CID

And you get this result set for your example:

PID  CID  cnt
---  ---  ---
p1   c1   2
p1   c2   3
p2   c1   6
p2   c2   5

Now, throw in COUNT(*) OVER (PARTITION BY PID) to return the number of categories per person:

SELECT
  PID,
  CID,
  COUNT(*) AS cnt,
  COUNT(*) OVER (PARTITION BY PID) AS cat_cnt
FROM checks
GROUP BY
  PID,
  CID

The OVER clause turns a "normal" aggregate function COUNT() into a window aggregate function. That makes the COUNT(*) operate on the grouped row set rather than the source one. So, COUNT(*) OVER ... in this case counts rows per PID, which for us has the meaning of category counts per person. And this is the updated result set:

PID  CID  cnt  cnt_cat
---  ---  ---  -------
p1   c1   2    2
p1   c2   3    2
p2   c1   6    2
p2   c2   5    2

One more thing left is to rank the cnt values per PID. This may be tricky as there may be ties at the top counts. If you always want a single row per PID and are perfectly indifferent to which CID, cnt will be in case of a tie, you can modify the query like this:

SELECT
  PID,
  CID,
  COUNT(*) AS cnt,
  COUNT(*) OVER (PARTITION BY PID) AS cat_cnt,
  ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
FROM checks
GROUP BY
  PID,
  CID

And this is what the result set will look like:

PID  CID  cnt  cnt_cat  rn
---  ---  ---  -------  --
p1   c1   2    2        2
p1   c2   3    2        1
p2   c1   6    2        1
p2   c2   5    2        2

At this point, the results contain all the data necessary to produce the final output, you just need to filter on cnt_cat and rn. However, you cannot do that directly. Instead, use the last query as a derived table, be it a WITH table expression or a "normal" subselect. Below is an example using WITH:

WITH grouped AS (
  SELECT
    PID,
    CID,
    COUNT(*) AS cnt,
    COUNT(*) OVER (PARTITION BY PID) AS cat_cnt,
    ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
  FROM checks
  GROUP BY
    PID,
    CID
)
SELECT PID, CID, cnt
FROM grouped
WHERE cat_cnt > 1
  AND rn = 1
;

Here's a SQL Fiddle demo (using Oracle): http://sqlfiddle.com/#!4/cd62d/8

To expand a bit more on the ranking part, if you still want to return a single CID, cnt per PID but would prefer to have more control on what row should be decided as the "winner", you'll need to add a tie-breaker to the ORDER BY clause of the ranking function. As an example, you could modify the original expression,

ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn

with this one:

ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC, CID) AS rn

I.e. the tie-breaker is CID, and so of the two or more CIDs with the top count, the one that sorts before the others wins.

Still, you may want to decide to return all the top counts per PID. In that case, use either RANK() or DENSE_RANK() instead of ROW_NUMBER() (and no tie-breaker), i.e. like this:

RANK() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn

What is the importance of COUNT(*) OVER (PARTITION BY PID) AS cat_cnt in this query? I get results without this column and just filtering for rn = 1. — Hillarie, Apr 10 '20 at 19:00
@Hillarie: It's to fulfil this part of the requirement: *‘I need to know which participant participate more than one category […]’* — Andriy M, Apr 11 '20 at 22:41

score 1 · Answer 3 · answered Jul 18 '13 at 07:02

1

This will give you some basic ideas:

enter image description here

And the results shown below. Also since p1 participated in more than one Category so p1 will come with each new category in a different row when we use: 'group by PID,CID'

enter image description here

answered Jul 18 '13 at 07:02

R.C

10,417
2
35
48

score -1 · Answer 4 · edited Jul 18 '13 at 08:06

-1

select pid, cid, count
from (
    select pid, cid, count(*) as count
    from checks
    group by pid, cid
    order by count DESC
) as temp
group by pid;

The same is working in MySQL.

edited Jul 18 '13 at 08:06

Andriy M

76,112
17
94
154

answered Jul 18 '13 at 06:16

Karthikeyan

990
5
12

Can you check this, I had to delete and edit again, if working, just upvote – Karthikeyan Jul 18 '13 at 06:30
You would still get all cid's not just the one with the highest count. But you are on the right track =) – The_Cthulhu_Kid Jul 18 '13 at 06:43
You solution is obviously wrong. You can't `SELECT` the `cid` when your're grouping by something else. – Emadpres Nov 30 '16 at 11:03

score -1 · Answer 5 · edited May 23 '17 at 10:30

-1

Here is MySQL solution:

SELECT tbl1.pid, tbl1.cid, tbl1.pairCount

FROM        (   SELECT checks.pid, checks.cid, COUNT(*) AS pairCount
                FROM checks
                GROUP BY checks.pid, checks.cid ) AS tbl1

            INNER JOIN 

            (   SELECT checks.pid, checks.cid, COUNT(*) AS pairCount
                FROM checks 
                GROUP BY checks.pid, checks.cid ) AS tbl2   

            ON tbl1.pid=tbl2.pid AND tbl1.cnt > tbl2.cnt

Sorry, I am using 2 subqueries, but couldn't figure out anything better. At least, it works. Fiddle

I couldn't simply use GROUP BY because, when using GROUP BY values returned for non-grouped columns are arbitrary, not from the same row where the MAX() value occurs: MYSQL shows incorrect rows when using GROUP BY

edited May 23 '17 at 10:30

Community

1
1

answered Jul 18 '13 at 06:52

user4035

22,508
11
59
94

Finally I found the answer – user01 Jul 18 '13 at 07:11
SELECT mc,c.pid pno,b.cid cno from (select max(c1) mc,pid from (select count(pid||cid) c1,pid,cid from checks group by pid,cid) a group by pid) c left outer join(select count(pid||cid) c1,pid,cid from checks group by pid,cid) b on c.pid=b.pid and mc=b.c1; – user01 Jul 18 '13 at 07:11
@Ayeswarya I think, your variant is worse, then mine as you are using 3 subquesries, while I am using 2. – user4035 Jul 18 '13 at 07:18
Although this query works for given scenario, but generally it is wrong. Because, for each `pid`, the condition `tbl1.cnt > tbl2.cnt` would be true for all the tuples but the one with minimum value. – Emadpres Nov 30 '16 at 13:49

Get Max Count from Group by

question

5 Answers5