Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

4082 questions
28
votes
2 answers

SQL window function with a where clause?

I'm trying to correlate two types of events for users. I want to see all event "B"s along with the most recent event "A" for that user prior to the "A" event. How would one accomplish this? In particular, I'm trying to do this in Postgres. I was…
MJ.
  • 1,269
  • 4
  • 12
  • 24
26
votes
1 answer

How to use window functions in PySpark?

I'm trying to use some windows functions (ntile and percentRank) for a data frame but I don't know how to use them. Can anyone help me with this please? In the Python API documentation there are no examples about it. Specifically, I'm trying to…
jegordon
  • 1,157
  • 3
  • 14
  • 17
25
votes
2 answers

Spark SQL window function with complex condition

This is probably easiest to explain through example. Suppose I have a DataFrame of user logins to a website, for instance: scala> df.show(5) +----------------+----------+ | …
user4601931
  • 4,982
  • 5
  • 30
  • 42
24
votes
2 answers

Unexpected results when using FIRST_VALUE() in SQL Server 2012

When I use FIRST_VALUE on a data set that I construct by hand I get one result, and when I use it on a data set that results from a left join, I get a different result - even though the data sets appear to me to contain the exact same data values.…
23
votes
3 answers

PostgreSQL - how should I use first_value()?

This answer to shows how to produce High/Low/Open/Close values from a ticker: Retrieve aggregates for arbitrary time intervals I am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for…
Brent.Longborough
  • 9,567
  • 10
  • 42
  • 62
22
votes
4 answers

Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

I have found for several times the following guidelines for getting the power spectrum of an audio signal: collect N samples, where N is a power of 2 apply a suitable window function to the samples, e.g. Hanning pass the windowed samples to an FFT…
Nuno Santos
  • 1,476
  • 3
  • 17
  • 34
21
votes
2 answers

How to use a SQL window function to calculate a percentage of an aggregate

I need to calculate percentages of various dimensions in a table. I'd like to simplify things by using window functions to calculate the denominator, however I am having an issue because the numerator has to be an aggregate as well. As a simple…
21
votes
3 answers

Partitioning by multiple columns in PySpark with columns in a list

My question is similar to this thread: Partitioning by multiple columns in Spark SQL but I'm working in Pyspark rather than Scala and I want to pass in my list of columns as a list. I want to do something like this: column_list =…
prk
  • 319
  • 1
  • 3
  • 10
21
votes
2 answers

Does Spark know the partitioning key of a DataFrame?

I want to know if Spark knows the partitioning key of the parquet file and uses this information to avoid shuffles. Context: Running Spark 2.0.1 running local SparkSession. I have a csv dataset that I am saving as parquet file on my disk like…
astro_asz
  • 2,278
  • 3
  • 15
  • 31
21
votes
2 answers

spark sql window function lag

I am looking at the window slide function for a Spark DataFrame in Scala. I have a DataFrame with columns Col1, Col2, Col3, date, volume and new_col. Col1 Col2 Col3 date volume new_col 201601 100.5 …
Ramesh
  • 1,563
  • 9
  • 25
  • 39
20
votes
3 answers

Count rows in partition with Order By

I was trying to understand PARTITION BY in postgres by writing a few sample queries. I have a test table on which I run my query. id integer | num integer ___________|_____________ 1 | 4 2 | 4 3 | 5 4 | 6 When I…
kishore
  • 604
  • 3
  • 7
  • 13
20
votes
1 answer

What's the default window frame for window functions

Running the following code: val sales = Seq( (0, 0, 0, 5), (1, 0, 1, 3), (2, 0, 2, 1), (3, 1, 0, 2), (4, 2, 0, 8), (5, 2, 2, 8)) .toDF("id", "orderID", "prodID", "orderQty") val orderedByID = Window.orderBy('id') val totalQty =…
Tom
  • 5,848
  • 12
  • 44
  • 104
20
votes
3 answers

Referencing current row in FILTER clause of window function

In PostgreSQL 9.4 the window functions have the new option of a FILTER to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant…
Patrick
  • 29,357
  • 6
  • 62
  • 90
20
votes
2 answers

How to make a SUM without group by

Here is my problem.. Actual Auction Ammanat id 7000 500 100 228,229 7000 100 100 228,229 7000 900 100 228,229 5000 0 0 230 I want result as given below Actual Auction …
Aijaz Chauhan
  • 1,511
  • 3
  • 25
  • 53
19
votes
2 answers

T-SQL calculate moving average

I am working with SQL Server 2008 R2, trying to calculate a moving average. For each record in my view, I would like to collect the values of the 250 previous records, and then calculate the average for this selection. My view columns are as…
RunW
  • 269
  • 1
  • 2
  • 12