Select N most frequent values of column B for each value of column A

Question

With a MySQL table like:

id | colA | colB
...| 1    | 13
...| 1    | 13
...| 1    | 12
...| 1    | 12
...| 1    | 11
...| 2    | 78
...| 2    | 78
...| 2    | 78
...| 2    | 13
...| 2    | 13
...| 2    | 9

For each value in colA I want to find the N most frequent values in colB.

Example result for N=2:

colA | colB
1    | 13
1    | 12
2    | 78
2    | 13

I am able to get all unique combinations of colA and colB with their frequencies using:

SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC;

Example result:

colA | colB | freq
1    | 13   | 2
1    | 12   | 2
1    | 11   | 1
2    | 78   | 3
2    | 13   | 2
2    | 9    | 1

But I struggle to apply a LIMIT for each value in colA instead of for the whole table.

This is basically like How to select most frequent value in a column per each id group?, just for MySQL instead of PostgreSQL.

I am using MariaDB 10.1 at the moment.

What is your PRIMARY KEY? – Strawberry Apr 05 '19 at 15:35 — Strawberry, Apr 05 '19 at 15:35
Neither `colA` nor `colB` are primary keys. – Apr 05 '19 at 16:47 — , Apr 05 '19 at 16:47

score 1 · Answer 1 · answered Apr 05 '19 at 15:43

Use window functions, if you can:

SELECT colA, colB, freq
FROM (SELECT colA, colB, COUNT(*) AS freq,
             DENSE_RANK() OVER (PARTITION BY colA ORDER BY COUNT(*) DESC) as seqnum
      FROM t
      GROUP BY colA, colB 
     ) ab
WHERE seqnum <= 2;

Note that you might want DENSE_RANK(), RANK() or ROW_NUMBER() depending on how you want to treat ties. If there are 5 colB values with the two highest ranks, then DENSE_RANK() will return all five.

If you want exactly two values, then use ROW_NUMBER().

score 0 · Answer 2 · answered Apr 05 '19 at 15:38

You can probably use a couple CTEs for this, something like:

WITH counts AS (
   SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC
), most_freq AS (
   SELECT colA, max(freq) FROM counts GROUP BY colA
)
   SELECT counts.*
     FROM counts
     JOIN most_freq ON (counts.colA = most_freq.colA 
                        AND counts.freq = most_freq.freq);

Select N most frequent values of column B for each value of column A

2 Answers2