Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

4082 questions
8
votes
3 answers

Is it possible to ignore null values when using LEAD window function in Spark?

My dataframe like this id value date 1 100 2017 1 null 2016 1 20 2015 1 100 2014 I would like to get most recent previous value but ignoring null id value date recent value 1 100 2017 20 1 null 2016 …
John
  • 1,531
  • 6
  • 18
  • 30
8
votes
1 answer

PySpark Window Function: multiple conditions in orderBy on rangeBetween/rowsBetween

Is it possible to create a Window function that can have multiple conditions in orderBy for rangeBetween or rowsBetween. Assume I have a data frame like below. user_id timestamp date event 0040b5f0 2018-01-22 13:04:32 …
8
votes
2 answers

Django filtering on Window functions

I have two Models in Django, A and B. Each A has several Bs assigned to it, and the Bs are ordered, which is done with a field B.order_index that counts upwards from zero for any A. I want to write a query that checks if there is any A where some of…
Florian Dietz
  • 877
  • 9
  • 20
8
votes
1 answer

How do I create named window partitions (aliases) in PostgreSQL?

The documentation for PostgreSQL window functions seems to imply you can use the same named window in multiple places in your query. However, I can't figure out how do I create a named window? SELECT first_value(vin) OVER( PARTITION BY vin ) AS w,…
Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
8
votes
1 answer

Cumulative sum of values by month, filling in for missing months

I have this data table and I'm wondering if is possible create a query that get a cumulative sum by month considering all months until the current month. date_added | qty ------------------------------------ 2015-08-04…
Goku
  • 1,750
  • 5
  • 23
  • 35
8
votes
2 answers

How to use a ring data structure in window functions

I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring: I'd like to create a window query that can…
Mike T
  • 41,085
  • 18
  • 152
  • 203
8
votes
4 answers

Find rows with duplicate values in a column

I have a table author_data: author_id | author_name ----------+---------------- 9 | ernest jordan 14 | k moribe 15 | ernest jordan 25 | william h nailon 79 | howard jason 36 | k moribe Now I need…
7
votes
3 answers

How to use LAST_VALUE in PostgreSQL?

I have a little table to try to understand how the LAST_VALUE function works in PostgreSQL. It looks like this: id | value ----+-------- 0 | A 1 | B 2 | C 3 | D 4 | E 5 | [null] 6 | F What I want to do is to use LAST_VALUE to fill…
Gaëtan
  • 779
  • 1
  • 8
  • 26
7
votes
0 answers

How to write MAX and OVER (PARTITION BY) functions in JPA query

I need to get one column(revision) maximum value based on another column(drawingNumber). Can anyone tell me the JPA query for this functionality. I have written the following query and this query is not working. Please help me how to write MAX…
raj
  • 91
  • 1
  • 5
7
votes
2 answers

PostgreSQL Window Functions

Consider the following table structure: CREATE TABLE tb_log ( id INTEGER PRIMARY KEY, moment DATE, old INTEGER, actual INTEGER ); Containing the data: INSERT INTO tb_log ( id, moment, old, actual ) VALUES ( 1, '2018-06-19', 10, 20 ), …
Lacobus
  • 1,590
  • 12
  • 20
7
votes
3 answers

lag to get first non null value since the previous null value

Below is an example of what I'm trying to achieve in a Redshift Database. I have a variable current_value and I want to create a new column value_desired that is: the same as current_value if the previous row is null equal to the last preceding…
Francesco Rinaldi
  • 163
  • 1
  • 1
  • 8
7
votes
2 answers

Fill in missing rows when aggregating over multiple fields in Postgres

I am aggregating sales for a set of products per day using Postgres and need to know not just when sales do happen, but also when they do not for further processing. SELECT sd.date, COUNT(sd.sale_id) AS sales, sd.product FROM sales_data sd --…
7
votes
3 answers

First and last value of window function in one row in PostgreSQL

I'd like to have first value of one column and last value of second column in one row for a specified partition. For that I created this query: SELECT DISTINCT b.machine_id, batch, timestamp_sta, timestamp_stp, FIRST_VALUE(timestamp_sta) OVER w AS…
Michal Špondr
  • 1,337
  • 2
  • 21
  • 44
7
votes
3 answers

Window Functions: last_value(ORDER BY ... ASC) same as last_value(ORDER BY ... DESC)

Sample data CREATE TABLE test (id integer, session_ID integer, value integer) ; INSERT INTO test (id, session_ID, value) VALUES (0, 2, 100), (1, 2, 120), (2, 2, 140), (3, 1, 900), (4, 1, 800), (5, 1, 500) ; Current…
S-Man
  • 22,521
  • 7
  • 40
  • 63
7
votes
2 answers

CodeGen grows beyond 64 KB error when normalizing large PySpark dataframe

I have a PySpark dataframe with 13 million rows and 800 columns. I need to normalize this data so have been using this code, which works with a smaller development data set. def z_score_w(col, w): avg_ = avg(col).over(w) stddev_ =…
MarkNS
  • 3,811
  • 2
  • 43
  • 60