Given the following table, the question is to find for example the top N C2 from each C1.
C1 C2
1 1
1 2
1 3
1 4
1 ...
2 1
2 2
2 3
2 4
2 ...
....
So if N = 3, the results are
C1 C2
1 1
1 2
1 3
2 1
2 2
2 3
....
The proposed solutions use the window function and partition by
For example,
SELECT rs.Field1,rs.Field2
FROM (
SELECT Field1,Field2, Rank()
over (Partition BY Section
ORDER BY RankCriteria DESC ) AS Rank
FROM table
) rs WHERE Rank <= 3
I guess what it does is sorting then picking the top N.
However if some categories have less N elements, we can get the top N w/o sorting because the top N must include all elements in the category.
The above query uses Rank(). My question applies to other window functions like row_num() or dense_rank().
Is there a way to ignore the sorting at the case?
Also I am not sure if the underlying engine can optimize the case: whether the inner partition/order considers the outer where constraints before sorting.
Using partition+order+where is a way to get the top-N element from each category. It works perfect if each category has more than N element, but has additional sorting cost otherwise. My question is if there is another approach that works well at both cases. Ideally it does the following
for each category {
if # of element <= N:
continue
sort and get the top N
}
For example, but is there a better SQL?
WITH table_with_count AS (
SELECT Field1, Field2, RankCriteria, count() over (PARTITION BY Section) as c
FROM table
),
rs AS (
SELECT Field1,Field2, Rank()
over (Partition BY Section
ORDER BY RankCriteria DESC ) AS Rank
FROM table_with_count
where c > 10
)
(SELECT Field1,Field2e FROM rs WHERE Rank <= 10)
union
(SELECT Field1,Field2 FROM table_with_count WHERE c <= 10)