1

I have a datatable with some records. Using mysql I am able to get a result grouped by a specific period (year) and users and ordered (in descending order) by number of species.

SELECT YEAR(entry_date) AS period, uid AS user, COUNT(DISTINCT pid) AS species
FROM records
WHERE YEAR(entry_date)<YEAR(CURDATE())
GROUP BY period, uid
ORDER by period, species DESC

Please see attached picture of the result. But what if I only want the get the TOP USER (and number of species) for EACH year (the red marked rows)? How can I achieve that?

I am able to handle this later in my php code but it would be nice to have this sortered out already in mysql query.

Thanks for your help!

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Consider upgrading MySQL as MySQL 5.5 end of life was December 2018 unless you maintain the [Oracle's Lifetime Support policy](https://www.mysql.com/support/eol-notice.html). [Others](https://dba.stackexchange.com/a/234266) including 5.6 runs through Feb 2021 and 5.7 through Oct 2023. – Parfait Apr 09 '20 at 19:32

1 Answers1

1

If you are running MySQL 8.0, you can use RANK() to rank records in years partitions by their count of species, and then filter on the top record per group:

SELECT *
FROM (
    SELECT 
        YEAR(entry_date) AS period, 
        uid AS user, 
        COUNT(DISTINCT pid) AS species,
        RANK() OVER(PARTITION BY YEAR(entry_date) ORDER BY COUNT(DISTINCT pid) DESC) rn
    FROM records
    WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
    GROUP BY period, uid
) t
WHERE rn = 1
ORDER by period

This preserves top ties, if any. Note that uses an index-friendly filter on the dates in the WHERE clause.

In earlier versions, an equivalent option is to filter with a HAVING clause and a correlated subquery:

SELECT 
    YEAR(entry_date) AS period, 
    uid AS user, 
    COUNT(DISTINCT pid) AS species
FROM records r
WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
GROUP BY period, uid
HAVING COUNT(DISTINCT pid) = (
    SELECT COUNT(DISTINCT r1.pid) species1
    FROM records r1
    WHERE YEAR(r1.entry_date) = period
    GROUP BY r1.uid
    ORDER BY species1 DESC
    LIMIT 1
)
ORDER by period
GMB
  • 216,147
  • 25
  • 84
  • 135
  • And what if not? This does not seem to work. Thanks anyway. – Michal Stružský Apr 09 '20 at 18:18
  • @MichalStružský: which version of MySQL are you running? `select version()` can tell you. – GMB Apr 09 '20 at 18:19
  • version 5.5.62-38.14 – Michal Stružský Apr 09 '20 at 18:20
  • But the version using HAVING has a ROW_NUMBER() as well. Can I get rid of it completely? My version does not seem to know that. (Syntax error near '(PARTITION BY YEAR(entry_date) ORDER BY COUNT(DISTINCT pid) DESC) rn FROM recor' at line 5) – Michal Stružský Apr 09 '20 at 18:23
  • @MichalStružský: I just (finished) providing a solution for earlier version (without `ROW_NUMBER()`, which was hanging in there due to bad copy/past). – GMB Apr 09 '20 at 18:24
  • Unfortunately, no rows returned. :( – Michal Stružský Apr 09 '20 at 18:28
  • I adjusted the query to fit my database (it uses czech table names) and the year condition and this query gives me only the top row for 2020. `SELECT YEAR(datum) AS period, uid AS user, COUNT(DISTINCT pid) AS species FROM _zaznamy r WHERE YEAR(datum) <= 2020 GROUP BY period, uid HAVING COUNT(DISTINCT pid) = ( SELECT COUNT(DISTINCT r1.pid) species1 FROM _zaznamy r1 WHERE YEAR(r1.datum) = YEAR(r.datum) GROUP BY r1.uid ORDER BY species1 DESC LIMIT 1 ) ORDER by period` – Michal Stružský Apr 09 '20 at 18:37
  • @MichalStružský: this should work... Can you provide a [db<>fiddle](https://dbfiddle.uk/) with sample data so we can test the query? – GMB Apr 09 '20 at 19:07
  • Sure, here you go. https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=331adadc7fa97df65abb26b86a2ccb6b (It should return uid 2017 | uid 22 | 5; 2018 | uid106 | 5; 2019 | uid106 | 6 if I am not mistaken) – Michal Stružský Apr 09 '20 at 19:40
  • @MichalStružský: I slightly updated the query, this seems to work fine: https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=c003267ef713a2efcbb011cb84b15268 – GMB Apr 09 '20 at 21:38