In almost all RDMS products, we have windowing functions. Broadly, they are classified as:
- Ranking functions (e.g: rank(), dense_rank() etc...)
- Aggregate functions (e.g: sum(), avg() etc..)
- Analytic functions (e.g: first(), lag() etc..)
In my understanding (by reading various blogs/documents), the 'frame' (either using rows/range) is considered into account only when we use aggregate functions. while applying ranking and analytical functions, 'frame' is not considered (either the default frame or explicitly provided frame).
Is my understanding correct ?
For example: let us consider a table having: id, year and salary. We have multiple ids and under each id & we have multiple year values. All salary values for an id are distinct.
And let us consider the below cases (used I select clause):
rank() over( partition by id order by year)
rank() over( partition by id order by year rows between 1 preceding and current row)
rank() over( partition by id order by year range between 1 preceding and current row)
All of the above versions return the exact same results.
But when we use sum(salary)
and apply the same above over()
clauses, they return different results.
Input table:
--------------------
Id Year Salary
1 2005 10000
1 2007 20000
1 2009 30000
1 2010 40000
2 2005 10000
2 2007 20000
2 2009 30000
2 2010 40000
All the below queries give same results when rank is calculated; which implies that range is NOT taken into account while deriving rank.
select id
,year
,salary
,rank() over( partition by id order by year) as x
from practice;
select id
,year
,salary
,rank() over( partition by id order by year rows between 1 preceding and current row) as x
from practice;
select id
,year
,salary
,rank() over( partition by id order by year range between 1 preceding and current row) as x
from practice;
Result:
Id Year Salary X
1 2005 10000 1
1 2007 20000 2
1 2009 30000 3
1 2010 40000 4
2 2005 10000 1
2 2007 20000 2
2 2009 30000 3
2 2010 40000 4
When sum(salary) is given, they start giving different results.
Thanks!