Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

4082 questions
11
votes
2 answers

GROUP BY consecutive dates delimited by gaps

Assume you have (in Postgres 9.1 ) a table like this: date | value which have some gaps in it (I mean: not every possible date between min(date) and max(date) has it's row). My problem is how to aggregate this data so that each consistent group…
One Data Guy
  • 303
  • 1
  • 3
  • 8
10
votes
4 answers

Speed of paged queries in Oracle

This is a never-ending topic for me and I'm wondering if I might be overlooking something. Essentially I use two types of SQL statements in an application: Regular queries with a "fallback" limit Sorted and paged queries Now, we're talking about…
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
10
votes
2 answers

How to get group number in postgresql?

I have a use case where I want to assign a unique increasing partition number to each partition when using a window query in postgres. For example, I want to get an output like this: Partition Key | Row_Num() | Partition Number Apple | 1 …
Pratik Singhal
  • 6,283
  • 10
  • 55
  • 97
10
votes
1 answer

How do window functions and the group by clause interact?

I do understand window functions and group by separately. But what happens when you use both a window function and a group by clause in the same query ? Are the selected rows grouped first, then considered by the window function ? Or does the…
Benjamin Crouzier
  • 40,265
  • 44
  • 171
  • 236
10
votes
3 answers

Oracle Lag function with dynamic parameter

I have a specific problem. I have a table which contains invalid values. I need to replace the invalid values (here 0) with the previous value which is bigger than 0. The difficulty is, it is not appropiate for me to use an Update or an…
aljassi
  • 246
  • 2
  • 10
10
votes
4 answers

SQLServer count() over() with distinct

I'm working on a project, where we need to count the number of distinct rows. A simplified version of the scenario includes a user table, a keyword table and a keyword_user table. The user table just includes common user meta data, like name etc.…
Teilmann
  • 2,146
  • 7
  • 28
  • 57
10
votes
3 answers

Spark and SparkSQL: How to imitate window function?

Description Given a dataframe df id | date --------------- 1 | 2015-09-01 2 | 2015-09-01 1 | 2015-09-03 1 | 2015-09-04 2 | 2015-09-04 I want to create a running counter or index, grouped by the same id and sorted by date in that group,…
Martin Senne
  • 5,939
  • 6
  • 30
  • 47
10
votes
1 answer

SPARK SQL Equivalent of Qualify + Row_number statements

Does anyone know the best way for Apache Spark SQL to achieve the same results as the standard SQL qualify() + rnk or row_number statements? For example: I have a Spark Dataframe called statement_data with 12 monthly records each for 100 unique…
10
votes
4 answers

Calculating median with Group By in AWS Redshift

I've seen other posts about using the median() window function in Redshift, but how would you use it with a query that has a group by at the end? For example, assume table course: Course | Subject | Num_Students ------------------------------- 1 …
clavid
  • 289
  • 5
  • 13
10
votes
1 answer

difference from first in group with dplyr

I'm trying to create a window function with dplyr, that will return a new vector with the difference between each value and the first of its group. For example, given this dataset: dummy <- data.frame(userId=rep(1,6), libId=rep(999,6), …
Omri374
  • 2,555
  • 3
  • 26
  • 40
10
votes
2 answers

How to reference data from next row?

I'm writing a function in PostgreSQL 9.2. For a table of stock prices and dates, I want to calculate the percentage change from the previous day for each entry. For the earliest day of data, there won't be a previous day, so that entry can simply be…
Terence Chow
  • 10,755
  • 24
  • 78
  • 141
9
votes
2 answers

Using an alias in a window function in a query in PostgreSQL

I am using PostgreSQL version 9.1 and looking at the Postgres docs, I know it is possible to do the following: SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; and this works for my queries. Now I need to be able to specify an…
EkcenierK
  • 1,429
  • 1
  • 19
  • 34
9
votes
1 answer

Spark "first" Window function is taking much longer than "last"

I'm working on a pyspark routine to interpolate the missing values in a configuration table. Imagine a table of configuration values that go from 0 to 50,000. The user specifies a few data points in between (say at 0, 50, 100, 500, 2000, 500000) and…
Alain
  • 26,663
  • 20
  • 114
  • 184
9
votes
4 answers

Why doesn't this windowed expression result in a divide by zero error?

I came across this answer on Programming Puzzles & Code Golf. In it, the author uses the expression (though the answer has since been edited to use a different solution): row_number()over(order by 1/0) I would have expected the 1/0 to result in a…
HoneyBadger
  • 14,750
  • 3
  • 34
  • 48
9
votes
2 answers

pandas rolling() function with monthly offset

I'm trying to use the rolling() function on a pandas data frame with monthly data. However, I dropped some NaN values, so now there are some gaps in my time series. Therefore, the basic window parameter gives a misleading answer since it just looks…
Jesse Blocher
  • 523
  • 1
  • 4
  • 16