Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

4082 questions
56
votes
3 answers

Spark Window Functions - rangeBetween dates

I have a Spark SQL DataFrame with date column, and what I'm trying to get is all the rows preceding current row in a given date range. So for example I want to have all the rows from 7 days back preceding given row. I figured out, I need to use a…
Nhor
  • 3,860
  • 6
  • 28
  • 41
49
votes
4 answers

pyspark: rolling average using timeseries data

I have a dataset consisting of a timestamp column and a dollars column. I would like to find the average number of dollars per week ending at the timestamp of each row. I was initially looking at the pyspark.sql.functions.window function, but that…
Bob Swain
  • 3,052
  • 3
  • 17
  • 28
47
votes
1 answer

PostgreSQL: running count of rows for a query 'by minute'

I need to query for each minute the total count of rows up to that minute. The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute: SELECT COUNT(id) AS count , EXTRACT(hour from…
GabiMe
  • 18,105
  • 28
  • 76
  • 113
44
votes
4 answers

What is the difference between rowsBetween and rangeBetween?

From the PySpark docs rangeBetween: rangeBetween(start, end) Defines the frame boundaries, from start (inclusive) to end (inclusive). Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one…
Evan Zamir
  • 8,059
  • 14
  • 56
  • 83
43
votes
2 answers

pyspark: count distinct over a window

I just tried doing a countDistinct over a window and got this error: AnalysisException: u'Distinct window functions are not supported: count(distinct color#1926) Is there a way to do a distinct count over a window in pyspark? Here's some example…
42
votes
4 answers

What is the Hamming window for?

I'm working with some code that does a Fourier transform (to calculate the cepstrum of an audio sample). Before it computes the Fourier transform, it applies a Hamming window to the sample: for(int i = 0; i < SEGMENTATION_LENGTH;i++){ …
fredley
  • 32,953
  • 42
  • 145
  • 236
36
votes
3 answers

How to use ROW_NUMBER() in UPDATE clause?

ROW_NUMBER() is only for used in the SELECT clause in MS SQL Server, but I want to use it for update like the following: Update MyTab Set MyNo = 123 + ROW_NUMBER() over (Order By ID) Where a=b; then I got Error like, Windowed functions can only…
KentZhou
  • 24,805
  • 41
  • 134
  • 200
35
votes
8 answers

How to perform grouped ranking in MySQL

So I have a table as follows: ID_STUDENT | ID_CLASS | GRADE ----------------------------- 1 | 1 | 90 1 | 2 | 80 2 | 1 | 99 3 | 1 | 80 4 | 1 | 70 5 | 2 …
achinda99
  • 5,020
  • 4
  • 34
  • 42
35
votes
6 answers

Dynamic alternative to pivot with CASE and GROUP BY

I have a table that looks like this: id feh bar 1 10 A 2 20 A 3 3 B 4 4 B 5 5 C 6 6 D 7 7 D 8 8 D And I want it to look like this: bar val1 val2 val3 A 10 20…
flipflop99
  • 583
  • 1
  • 5
  • 9
32
votes
1 answer

Avoid performance impact of a single partition mode in Spark window functions

My question is triggered by the use case of calculating the differences between consecutive rows in a spark dataframe. For example, I have: >>> df.show() +-----+----------+ |index| col1| +-----+----------+ | 0.0|0.58734024| | …
32
votes
2 answers

Applying a Window function to calculate differences in pySpark

I am using pySpark, and have set up my dataframe with two columns representing a daily asset price as follows: ind = sc.parallelize(range(1,5)) prices = sc.parallelize([33.3,31.1,51.2,21.3]) data = ind.zip(prices) df =…
Thomas Moore
  • 941
  • 2
  • 11
  • 17
30
votes
2 answers

Filtering by window function result in Postgresql

Ok, initially this was just a joke we had with a friend of mine, but it turned into interesting technical question :) I have the following stuff table: CREATE TABLE stuff ( id serial PRIMARY KEY, volume integer NOT NULL DEFAULT 0, …
Maxim Sloyko
  • 15,176
  • 9
  • 43
  • 49
30
votes
1 answer

Row_number over (Partition by yyy) in Entity Framework

I want to load data by Row_number over Partition by using EF. SELECT * FROM ( SELECT sf.SerialFlowsId ,sf.GoodsSerialId ,d.FormTypeId , d.GoodsId …
Mahdi Farhani
  • 964
  • 1
  • 9
  • 22
30
votes
6 answers

Count distinct values with OVER(PARTITION BY id)

Is it possible to count distinct values in conjunction with window functions like OVER(PARTITION BY id)? Currently my query is as follows: SELECT congestion.date, congestion.week_nb, congestion.id_congestion, congestion.id_element, ROW_NUMBER()…
user007
  • 1,064
  • 3
  • 12
  • 29
29
votes
2 answers

Applying Multiple Window Functions On Same Partition

Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary) For example you can do SELECT name, first_value() over (partition by name order by date) from table1 But is there a way to…
Verhogen
  • 27,221
  • 34
  • 90
  • 109