Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

4082 questions
13
votes
2 answers

How to write SQL window functions in pandas

Is there an idiomatic equivalent to SQL's window functions in Pandas? For example, what's the most compact way to write the equivalent of this in Pandas? SELECT state_name, state_population, SUM(state_population) OVER() AS…
2daaa
  • 2,788
  • 7
  • 33
  • 44
13
votes
1 answer

SQL Server Performance Comparison Between Over Partition By And Group By

Although a couple of questions were already posted in SO about the difference between Over Partition By and Group By, I did not find a definitive conclusion about which performs better. I set up a simple scenario at SqlFiddle, where Over (Partition…
Veverke
  • 9,208
  • 4
  • 51
  • 95
13
votes
1 answer

Aggregate continuous ranges of dates

Let's say you have the following PostgreSQL sparse table listing reservation dates: CREATE TABLE reserved_dates ( reserved_days_id SERIAL NOT NULL, reserved_date DATE NOT NULL ); INSERT INTO reserved_dates (reserved_date)…
13
votes
2 answers

Aggregate values over a range of hours, every hour

I have a PostgreSQL 9.1 database with a table containing a timestamp and a measuring value '2012-10-25 01:00' 2 '2012-10-25 02:00' 5 '2012-10-25 03:00' 12 '2012-10-25 04:00' 7 '2012-10-25 05:00' 1 ... ... I need to…
12
votes
1 answer

Django ORM group by, and find latest item of each group (window functions)

Say we have a model as below class Cake(models.Model): baked_on = models.DateTimeField(auto_now_add=True) cake_name = models.CharField(max_length=20) Now, there are multiple Cakes baked on the same day, and I need a query that will return…
12
votes
2 answers

How to use lag and rangeBetween functions on timestamp values?

I have data that looks like this: userid,eventtime,location_point 4e191908,2017-06-04 03:00:00,18685891 4e191908,2017-06-04 03:04:00,18685891 3136afcb,2017-06-04 03:03:00,18382821 661212dd,2017-06-04 03:06:00,80831484 40e8a7c3,2017-06-04…
ultraInstinct
  • 4,063
  • 10
  • 36
  • 53
12
votes
3 answers

Analytic count over partition with and without ORDER BY clause

I don't understand why there are different results when using an ORDER BY clause in an analytic COUNT function. Using a simple example: with req as (select 1 as n, 'A' as cls from dual union select 2 as n, 'A' as cls from dual) select…
Carmellose
  • 4,815
  • 10
  • 38
  • 56
12
votes
1 answer

Calculating SQL Server ROW_NUMBER() OVER() for a derived table

In some other databases (e.g. DB2, or Oracle with ROWNUM), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance: ROW_NUMBER() OVER() This is particularly useful when used with ordered derived tables, such as: SELECT…
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
12
votes
2 answers

Last_value window function doesn't work properly

Last_value window function doesn't work properly. CREATE TABLE EXAMP2 ( CUSTOMER_ID NUMBER(38) NOT NULL, VALID_FROM DATE NOT NULL ); Customer_id …
user2331299
  • 121
  • 1
  • 4
11
votes
3 answers

PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?

I'm looking for something similar this in SQL Server: SELECT TOP n WITH TIES FROM tablename I know about LIMIT in PostgreSQL, but does the equivalent of the above exist? I'm just curious as it would save an extra query each time for me. If I have a…
Andrew
  • 2,519
  • 6
  • 29
  • 46
11
votes
1 answer

Generate id row for a view with grouping

I'm trying to create a view with row numbers like so: create or replace view daily_transactions as select generate_series(1, count(t)) as id, t.ic, t.bio_id, t.wp, date_trunc('day', t.transaction_time)::date…
Random Joe
  • 640
  • 4
  • 10
  • 25
11
votes
1 answer

Will Postgres push down a WHERE clause into a VIEW with a Window Function (Aggregate)?

The docs for Pg's Window function say: The rows considered by a window function are those of the "virtual table" produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because…
Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
11
votes
2 answers

Optimizing window function in PostgreSQL to use index

I have a table in a PostgreSQL 9.2 DB, created and filled as follows: CREATE TABLE foo( id integer, date date ); INSERT INTO foo SELECT (id % 10) + 1, now() - (id % 50) * interval '1 day' FROM generate_series(1, 100000) AS id; Now, I need to find…
St.Antario
  • 26,175
  • 41
  • 130
  • 318
11
votes
4 answers

Select a row of first non-null values in a sparse table

Using the following table: A | B | C | ts --+------+------+------------------ 1 | null | null | 2016-06-15 10:00 4 | null | null | 2016-06-15 11:00 4 | 9 | null | 2016-06-15 12:00 5 | 1 | 7 | 2016-06-15 13:00 How do I select the…
Alex B
  • 82,554
  • 44
  • 203
  • 280
11
votes
2 answers

Select random row for each group

I have a table like this ID ATTRIBUTE 1 A 1 A 1 B 1 C 2 B 2 C 2 C 3 A 3 B 3 C I'd like to select just one random attribute for each ID. The result therefore could look like this (although this is just one…
speendo
  • 13,045
  • 22
  • 71
  • 107