Select top 10 records for each category

Question

I want to return top 10 records from each section in one query. Can anyone help with how to do it? Section is one of the columns in the table.

Database is SQL Server 2005. I want to return the top 10 by date entered. Sections are business, local, and feature. For one particular date I want only the top (10) business rows (most recent entry), the top (10) local rows, and the top (10) features.

Darrel Miller · Answer 1 · 2014-05-04T17:05:21.847

273

If you are using SQL 2005 you can do something like this...

SELECT rs.Field1,rs.Field2 
    FROM (
        SELECT Field1,Field2, Rank() 
          over (Partition BY Section
                ORDER BY RankCriteria DESC ) AS Rank
        FROM table
        ) rs WHERE Rank <= 10

If your RankCriteria has ties then you may return more than 10 rows and Matt's solution may be better for you.

edited May 04 '14 at 17:05

answered Oct 07 '08 at 02:13

Darrel Miller

139,164
32
194
243

41

If you really just want the top 10, change it to RowNumber() instead of Rank(). No ties then. – Mike L Oct 07 '08 at 03:26
4

This works, but be aware that rank() is likely to be turned into a full table sort by the query planner if there isn't an index who's *first* key is the RankCriteria. In this case you may get better mileage selecting the distinct sections and cross applying to pick out the top 10 ordered by RankCriteria desc. – Joe Kearney Feb 13 '13 at 09:42
Great Answer! Got me almost exactly what I needed. I ended up going with `DENSE_RANK` which doesn't have any gaps in numbering. +1 – Michael Stramel Mar 21 '13 at 23:33
"FROM table ) rs WHERE Rank <= 10)" what is this 'rs' thing – Facbed May 04 '14 at 15:41
1

@Facbed It's just an alias on the table. – Darrel Miller May 04 '14 at 17:04
20

For anyone using Sql Server, the RowNumber() function mentioned by Mike L is ROW_NUMBER(). – randomraccoon Jul 06 '16 at 23:07

score 124 · Answer 2 · edited Jun 25 '12 at 11:39

124

In T-SQL, I would do:

WITH TOPTEN AS (
    SELECT *, ROW_NUMBER() 
    over (
        PARTITION BY [group_by_field] 
        order by [prioritise_field]
    ) AS RowNo 
    FROM [table_name]
)
SELECT * FROM TOPTEN WHERE RowNo <= 10

edited Jun 25 '12 at 11:39

kratenko

7,354
4
36
61

answered Jun 25 '12 at 10:16

Phil Rabbitt

1,241
1
8
2

2

:Please be more descriptive about your solution. Refer:[How to Answer](http://stackoverflow.com/questions/how-to-answer) – askmish Oct 20 '12 at 01:20
Is select query at CTE can contain where clause? – toha Sep 13 '16 at 10:37
1

@toha Yes it can – KindaTechy May 11 '17 at 14:05
2

Although you say "In T-SQL" this works for any database implementing the `ROW_NUMBER` function. For example, I've used this solution in SQLite. – Tony Oct 06 '19 at 09:57
2

It works for postgres sql as well. I just had to use "order by [prioritise_field] desc " – Phun May 11 '20 at 05:58
you are adding a row number to each distinct `[group_by_field]` in the order of `[prioritise_field]`. These row numbers are then used to filter out rows in the outer query. – metinsenturk Jun 16 '23 at 18:26

lorond · Answer 3 · 2020-10-11T10:55:40.897

46

SELECT r.*
FROM
(
    SELECT
        r.*,
        ROW_NUMBER() OVER(PARTITION BY r.[SectionID]
                          ORDER BY r.[DateEntered] DESC) rn
    FROM [Records] r
) r
WHERE r.rn <= 10
ORDER BY r.[DateEntered] DESC

edited Oct 11 '20 at 10:55

answered Jun 15 '12 at 14:26

lorond

3,856
2
37
52

What if there is a tie in DateEntered? – Yiping Oct 12 '20 at 01:07
1

@Yiping Undefined. Any row could win. It depends on many circumstances and can vary. You can make the order be more specific by adding more columns, e.g. `ORDER BY r.DateEntered, r.ID`, but it's all up to your particular task. – lorond Oct 13 '20 at 18:46
Thx, I think use RANK() in your answer will make it work. – Yiping Oct 14 '20 at 08:42
@Yiping question was about top 10 rows from each category, RANK() could produce more. But if you need top 10 ties, RANK() definitely would be the best option. – lorond Oct 14 '20 at 12:45
rank() and dense_rank() was a bad idea, in my case using rank or dense_rank returned 15 rows, not 10. but when I used row_number() it always returned up to 10 rows only - correctly. you have my vote for this best answer for my needs, similar to OP – hamish Feb 09 '21 at 12:07

Matt Hamilton · Answer 4 · 2008-10-07T02:26:04.817

35

This works on SQL Server 2005 (edited to reflect your clarification):

select *
from Things t
where t.ThingID in (
    select top 10 ThingID
    from Things tt
    where tt.Section = t.Section and tt.ThingDate = @Date
    order by tt.DateEntered desc
    )
    and t.ThingDate = @Date
order by Section, DateEntered desc

edited Oct 07 '08 at 02:26

answered Oct 07 '08 at 02:09

Matt Hamilton

200,371
61
386
320

2

This doesn't work for rows where Section is null, though. You'd need to say "where (tt.Section is null and t.Section is null) or tt.Section = t.Section" – Matt Hamilton Oct 07 '08 at 02:11

Bill Karwin · Answer 5 · 2010-07-02T17:06:41.023

19

I do it this way:

SELECT a.* FROM articles AS a
  LEFT JOIN articles AS a2 
    ON a.section = a2.section AND a.article_date <= a2.article_date
GROUP BY a.article_id
HAVING COUNT(*) <= 10;

update: This example of GROUP BY works in MySQL and SQLite only, because those databases are more permissive than standard SQL regarding GROUP BY. Most SQL implementations require that all columns in the select-list that aren't part of an aggregate expression are also in the GROUP BY.

edited Jul 02 '10 at 17:06

answered Oct 07 '08 at 04:07

Bill Karwin

538,548
86
673
828

1

Does that work? I'm pretty sure you'd "a.somecolumn is invalid in the select list as it is not contained in an aggregate function or the group by clause" for every column in articles except article_id.. – Blorgbeard Oct 07 '08 at 10:39
1

You should be able to include other columns that are functionally dependent on the column(s) named in the GROUP BY. Columns that are not functionally dependent are ambiguous. But you're right, depending on RDBMS implementation. It works in MySQL but IIRC fails in InterBase/Firebird. – Bill Karwin Oct 07 '08 at 21:03
1

Would this work in the case that the top eleven records for a section all had the same date? They would all have counts of 11 and the result would be an empty set. – Arth Nov 05 '15 at 17:13
No, you need to have some way of breaking ties if they all have the same date. See http://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column/123481#123481 for an example. – Bill Karwin Nov 05 '15 at 18:27
@BillKarwin thanks! Do you know if there's a way to do it when an article belongs to more than 1 section. For example, referring to the original question, an article about a local business might be classified under both "local" and "business" sections. – carlosgg Mar 16 '17 at 20:20
1

@carlosgg, if articles have a many-to-many relationship with sections, then you'd need to have an intersection table to map articles to their sections. Then your query would have to join to an intersection table for the m2m relationship, and group by article_id and section. That should get you started, but I'm not going to write out the whole solution in a comment. – Bill Karwin Mar 16 '17 at 23:42

Vadim Loboda · Answer 6 · 2019-09-13T07:16:35.710

If we use SQL Server >= 2005, then we can solve the task with one select only:

declare @t table (
    Id      int ,
    Section int,
    Moment  date
);

insert into @t values
(   1   ,   1   , '2014-01-01'),
(   2   ,   1   , '2014-01-02'),
(   3   ,   1   , '2014-01-03'),
(   4   ,   1   , '2014-01-04'),
(   5   ,   1   , '2014-01-05'),

(   6   ,   2   , '2014-02-06'),
(   7   ,   2   , '2014-02-07'),
(   8   ,   2   , '2014-02-08'),
(   9   ,   2   , '2014-02-09'),
(   10  ,   2   , '2014-02-10'),

(   11  ,   3   , '2014-03-11'),
(   12  ,   3   , '2014-03-12'),
(   13  ,   3   , '2014-03-13'),
(   14  ,   3   , '2014-03-14'),
(   15  ,   3   , '2014-03-15');


-- TWO earliest records in each Section

select top 1 with ties
    Id, Section, Moment 
from
    @t
order by 
    case 
        when row_number() over(partition by Section order by Moment) <= 2 
        then 0 
        else 1 
    end;


-- THREE earliest records in each Section

select top 1 with ties
    Id, Section, Moment 
from
    @t
order by 
    case 
        when row_number() over(partition by Section order by Moment) <= 3 
        then 0 
        else 1 
    end;


-- three LATEST records in each Section

select top 1 with ties
    Id, Section, Moment 
from
    @t
order by 
    case 
        when row_number() over(partition by Section order by Moment desc) <= 3 
        then 0 
        else 1 
    end;

+1 I like this solution for it's simplicity but could you explain how using `top 1` works with the `case` statement in the `order by` clause returning 0 or 1? — Ceres, Jun 19 '15 at 14:40
TOP 1 works with WITH TIES here. WITH TIES means that when ORDER BY = 0, then SELECT takes this record (because of TOP 1) and all others that have ORDER BY = 0 (because of WITH TIES) — Vadim Loboda, Jun 19 '15 at 15:24

Blorgbeard · Answer 7 · 2008-10-07T10:41:46.283

15

If you know what the sections are, you can do:

select top 10 * from table where section=1
union
select top 10 * from table where section=2
union
select top 10 * from table where section=3

edited Oct 07 '08 at 10:41

answered Oct 07 '08 at 02:10

Blorgbeard

101,031
48
228
272

5

This would be the easiest way of doing it. – Hector Sosa Jr Oct 07 '08 at 02:15
4

But this would be inefficient if you have 150 or if categories are variable by day, week, etc. – Rafa Barragan Sep 08 '17 at 15:26
4

Sure, but to quote OP: "Sections are business, local, and feature". If you have three static categories, this is the best way to do it. – Blorgbeard Sep 08 '17 at 15:31

score 9 · Answer 8 · answered Feb 14 '11 at 12:10

9

I know this thread is a little bit old but I've just bumped into a similar problem (select the newest article from each category) and this is the solution I came up with :

WITH [TopCategoryArticles] AS (
    SELECT 
        [ArticleID],
        ROW_NUMBER() OVER (
            PARTITION BY [ArticleCategoryID]
            ORDER BY [ArticleDate] DESC
        ) AS [Order]
    FROM [dbo].[Articles]
)
SELECT [Articles].* 
FROM 
    [TopCategoryArticles] LEFT JOIN 
    [dbo].[Articles] ON
        [TopCategoryArticles].[ArticleID] = [Articles].[ArticleID]
WHERE [TopCategoryArticles].[Order] = 1

This is very similar to Darrel's solution but overcomes the RANK problem that might return more rows than intended.

answered Feb 14 '11 at 12:10

Diadistis

12,086
1
33
55

Why use CTE Sir? Is it reduce memory consumption? – toha Sep 13 '16 at 10:22
@toha because CTEs are simpler and easier to understand – Reversed Engineer Dec 04 '19 at 18:23
Great answer!! It could be optimised by using inner `JOIN` instead of `LEFT JOIN`, since there will never be a record for `TopCategoryArticles` without a corresponding `Article` record. – Reversed Engineer Dec 04 '19 at 18:36

score 8 · Answer 9 · answered Nov 12 '17 at 06:55

8

Tried the following and it worked with ties too.

SELECT rs.Field1,rs.Field2 
FROM (
    SELECT Field1,Field2, ROW_NUMBER() 
      OVER (Partition BY Section
            ORDER BY RankCriteria DESC ) AS Rank
    FROM table
    ) rs WHERE Rank <= 10

answered Nov 12 '17 at 06:55

Raghu S

117
1
8

Hello there, the query works fine for me until I add an aggregated measure in the view. SELECT rs.Field1,rs.Field2,rs.measure FROM ( SELECT Field1,Field2, sum(cast(measure as INT)) over(partition by section) as agg_measure, ROW_NUMBER() OVER (Partition BY Section ORDER BY agg_measure DESC ) AS Rank FROM table ) rs WHERE Rank <= 10 Can you please help me where I am going wrong. – Harish Mahamure Oct 22 '20 at 15:43

score 6 · Answer 10 · answered Feb 01 '13 at 05:10

6

If you want to produce output grouped by section, displaying only the top n records from each section something like this:

SECTION     SUBSECTION

deer        American Elk/Wapiti
deer        Chinese Water Deer
dog         Cocker Spaniel
dog         German Shephard
horse       Appaloosa
horse       Morgan

...then the following should work pretty generically with all SQL databases. If you want the top 10, just change the 2 to a 10 toward the end of the query.

select
    x1.section
    , x1.subsection
from example x1
where
    (
    select count(*)
    from example x2
    where x2.section = x1.section
    and x2.subsection <= x1.subsection
    ) <= 2
order by section, subsection;

To set up:

create table example ( id int, section varchar(25), subsection varchar(25) );

insert into example select 0, 'dog', 'Labrador Retriever';
insert into example select 1, 'deer', 'Whitetail';
insert into example select 2, 'horse', 'Morgan';
insert into example select 3, 'horse', 'Tarpan';
insert into example select 4, 'deer', 'Row';
insert into example select 5, 'horse', 'Appaloosa';
insert into example select 6, 'dog', 'German Shephard';
insert into example select 7, 'horse', 'Thoroughbred';
insert into example select 8, 'dog', 'Mutt';
insert into example select 9, 'horse', 'Welara Pony';
insert into example select 10, 'dog', 'Cocker Spaniel';
insert into example select 11, 'deer', 'American Elk/Wapiti';
insert into example select 12, 'horse', 'Shetland Pony';
insert into example select 13, 'deer', 'Chinese Water Deer';
insert into example select 14, 'deer', 'Fallow';

answered Feb 01 '13 at 05:10

Craig Tullis

9,939
2
21
21

This doesn't work when I want just the first record for each section. It eliminates all section groups that have more than 1 record. I tried by replacing <=2 with <=1 – nils Jul 18 '16 at 14:52
@nils There are only three section values: deer, dog and horse. If you change the query to <= 1, you get one subsection for each section: American Elk/Wapiti for deer, Cocker Spaniel for dog and Appaloosa for horse. These are also the first values in each section alphabetically. The query is *meant* to eliminate all of the other values. – Craig Tullis Jul 18 '16 at 15:17
But when I try to run your query, it eliminates everything because the count is >=1 for everything. It doesn't preserve the 1st subsection for each section. Can you try to run your query for <=1 and let me know if you get the first subsection for each section? – nils Jul 19 '16 at 16:28
@nils Hi, I did recreate this little test database from the scripts and ran the query using <= 1, and it returned the first subsection value from each section. What database server are you using? There's always a chance it's related to your database of choice. I just ran this in MySQL because it was handy and it behaved as expected. I'm pretty sure when I did it the first time (I wanted to make sure what I posted actually worked without debuggin), I'm pretty sure I did it using either Sybase SQL Anywhere or MS SQL Server. – Craig Tullis Jul 20 '16 at 22:38
it worked perfectly for me in mysql. I changed a query little bit not sure why did he used <= for varchar field in subsection.. i changed it to and x2.subsection = x1.subsection – Mahen Nakar Jan 30 '20 at 04:50

score 6 · Answer 11 · edited Sep 14 '13 at 18:21

Q) Finding TOP X records from each group(Oracle)

SQL> select * from emp e 
  2  where e.empno in (select d.empno from emp d 
  3  where d.deptno=e.deptno and rownum<3)
  4  order by deptno
  5  ;

 EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO

  7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
  7839 KING       PRESIDENT            17-NOV-81       5000                    10
  7369 SMITH      CLERK           7902 17-DEC-80        800                    20
  7566 JONES      MANAGER         7839 02-APR-81       2975                    20
  7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
  7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30

6 rows selected.

The question was about SQL Server, not Oracle. – Craig Tullis Jul 18 '16 at 15:21 — Craig Tullis, Jul 18 '16 at 15:21

score 5 · Answer 12 · answered May 14 '18 at 09:07

5

While the question was about SQL Server 2005, most people have moved on and if they do find this question, what could be the preferred answer in other situations is one using CROSS APPLY as illustrated in this blog post.

SELECT *
FROM t
CROSS APPLY (
  SELECT TOP 10 u.*
  FROM u
  WHERE u.t_id = t.t_id
  ORDER BY u.something DESC
) u

This query involves 2 tables. The OP's query only involves 1 table, in case of which a window function based solution might be more efficient.

answered May 14 '18 at 09:07

Lukas Eder

211,314
129
689
1,509

This was good stuff and appreciate the sql support and also used this link to get me going too. https://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/ Thanks. Doug Lubey of Lousiana. – Doug Lubey of Louisiana May 11 '23 at 20:24

score 4 · Answer 13 · answered Oct 07 '08 at 02:05

4

Might the UNION operator work for you? Have one SELECT for each section, then UNION them together. Guess it would only work for a fixed number of sections though.

answered Oct 07 '08 at 02:05

sblundy

60,628
22
121
123

score 1 · Answer 14 · answered Jan 25 '17 at 22:41

1

You can try this approach. This query returns 10 most populated cities for each country.

   SELECT city, country, population
   FROM
   (SELECT city, country, population, 
   @country_rank := IF(@current_country = country, @country_rank + 1, 1) AS country_rank,
   @current_country := country 
   FROM cities
   ORDER BY country, population DESC
   ) ranked
   WHERE country_rank <= 10;

answered Jan 25 '17 at 22:41

Ali

331
2
4

This solution does not pass a test case when we have a table with a record of one country with 9 same population for example it returns null instead of returning all 9 available records in order. Any suggestion to fix this issue? – Mojgan Mazouchi Jul 23 '18 at 23:31

dkellner · Answer 15 · 2023-03-29T18:31:54.257

Note: I know the OP has only 3 groups but this is a known general problem for many developers and there's no really good solution in SQL. So let me show you another way.

IN THEORY:

You can write it as one query. That is, formally one query, but it contans either subqueries or a self-JOIN which makes it in fact multiple operations under the hood. So you might as well just select each group individually.

IN PRACTICE:

If you want a performant solution, you need to work a bit more. Let's say you have 100 employees, you have 26 buildings from A to Z, people move around them (enter/leave), and you need the last 5 events for every building.

EVENT_ID   EVENT_TIME            EMPOYEE_ID   EVENT_CODE   BUILDING                           
-------------------------------------------------------------------------                                
883691     2023-03-29 11:00:00   92           enter        A                                           
883690     2023-03-29 11:00:21   78           enter        C                                           
883689     2023-03-29 11:00:25   58           enter        A                                           
883688     2023-03-29 11:02:10   22           leave        H                                           
883687     2023-03-29 11:31:42   73           leave        P                                           
...
...

You want to avoid 26 queries.

Here's what you can do:

Write a query with a simple ORDER BY EVENT_ID DESC (or EVENT_TIME DESC), to get the last N events for all buildings.
Set N (the limit) to a reasonable estimate that will have data for most buildings, you don't need to have all of them but the more the better. Let's say LIMIT 5000.
Process the result set on the application side, to see which buildings don't have the top 10 in it.
For those buildings, run separate queries to get their top 10.

For theorists, this is an anti-pattern. But the first query will have almost the same performance as one of the single-building ones, and probably brings you most of what you need; a few buildings will be missing, depending on how the employees normally move. Then, you may need 5 more queries for those buildings, and some merging on the application level.

In short: get a result that's almost complete, then make it complete.

So if you need performance, this is one way. If you need clarity of the business logic, well, choose any other answer. This one is scary. But when you go for speed, you often need scary techniques.

Select top 10 records for each category

15 Answers15

IN THEORY:

IN PRACTICE:

Linked

Related