Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

4082 questions

votes

2 answers

User defined function to be applied to Window in PySpark?

I am trying to apply a user defined function to Window in PySpark. I have read that UDAF might be the way to to go, but I was not able to find anything concrete. To give an example (taken from here: Xinh's Tech Blog and modified for PySpark): from…

asked Jan 09 '18 at 01:20

Akavall

82,592
51
207
251

votes

1 answer

Postgres window function and group by exception

I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time. Here's the query I have so far: SELECT p.name, e.date, sum(sp.payout) OVER (ORDER BY e.date) -…

sql postgresql aggregate-functions window-functions

asked Jan 13 '12 at 01:44

Martin

2,180
4
21
41

votes

1 answer

PySpark / Spark Window Function First/ Last Issue

From my understanding first/ last function in Spark will retrieve first / last row of each partition/ I am not able to understand why LAST function is giving incorrect results. This is my code. AgeWindow =…

sql apache-spark pyspark apache-spark-sql window-functions

asked Sep 11 '18 at 09:48

Nikhil Redij

1,011
1
14
21

votes

5 answers

Rounding numbers to the nearest 10 in Postgres

I'm trying to solve this particular problem from PGExercises.com: https://www.pgexercises.com/questions/aggregates/rankmembers.html The gist of the question is that I'm given a table of club members and half hour time slots that they have booked…

sql postgresql integer window-functions integer-division

asked Dec 18 '16 at 16:21

neuron

votes

4 answers

MySql using correct syntax for the over clause

What is the correct syntax to get the over clause to work in mysql? I would like to see the total sms's sent by each user without grouping it with the group by clause. SELECT username, count(sentSmsId) OVER (userId) FROM …

mysql sql window-functions

asked Jun 09 '11 at 12:19

Nightwolf

votes

1 answer

PostgreSQL window function: row_number() over (partition col order by col2)

Following result set is derived from a sql query with a few joins and a union. The sql query already groups rows on Date and game. I need a column to describe the number of attempts at a game partitioned by date column. Username Game ID …

sql postgresql window-functions row-number

asked Aug 29 '14 at 06:07

user1951677

votes

3 answers

GROUP BY and aggregate sequential numeric values

Using PostgreSQL 9.0. Let's say I have a table containing the fields: company, profession and year. I want to return a result which contains unique companies and professions, but aggregates (into an array is fine) years based on numeric…

sql postgresql plpgsql window-functions gaps-and-islands

asked Nov 04 '11 at 19:13

Joshua Burns

8,268
4
48
61

votes

1 answer

Difference between "ROWS BETWEEN" and "RANGE BETWEEN" in (Presto) window function "OVER" clause

This question is primarily about older versions of PrestoSQL, which have been resolved in the (now renamed) Trino project as of versions 346. However, Amazon's Athena project is based off of Presto versions 0.217 (Athena Engine 2) and 0.172 (Athena…

window-functions amazon-athena presto trino

asked Feb 19 '20 at 14:17

mbafford

2,266
1
20
25

votes

6 answers

Getting the First and Last Row Using ROW_NUMBER and PARTITION BY

Sample Input Name | Value | Timestamp -----|-------|----------------- One | 1 | 2016-01-01 02:00 Two | 3 | 2016-01-01 03:00 One | 2 | 2016-01-02 02:00 Two | 4 | 2016-01-03 04:00 Desired Output Name | Value | EarliestTimestamp |…

sql sql-server tsql window-functions row-number

asked Jul 21 '16 at 13:55

Muhammad Rehan Saeed

35,627
39
202
311

votes

2 answers

Partitioning by multiple columns in Spark SQL

With Spark SQL's window functions, I need to partition by multiple columns to run my data queries, as follows: val w = Window.partitionBy($"a").partitionBy($"b").rangeBetween(-100, 0) I currently do not have a test environment (working on settings…

apache-spark apache-spark-sql window-functions

asked Jun 13 '16 at 17:11

Eric Staner

votes

1 answer

Spark Task not serializable with lag Window function

I've noticed that after I use a Window function over a DataFrame if I call a map() with a function, Spark returns a "Task not serializable" Exception This is my code: val hc:org.apache.spark.sql.hive.HiveContext = new…

scala apache-spark serialization apache-spark-sql window-functions

asked May 18 '16 at 13:34

LucaGuerra

votes

2 answers

TSQL OVER clause: COUNT(*) OVER (ORDER BY a)

This is my code: USE [tempdb]; GO IF OBJECT_ID(N'dbo.t') IS NOT NULL BEGIN DROP TABLE dbo.t END GO CREATE TABLE dbo.t ( a NVARCHAR(8), b NVARCHAR(8) ); GO INSERT t VALUES ('a', 'b'); INSERT t VALUES ('a', 'b'); INSERT t VALUES ('a',…

sql sql-server tsql sql-server-2012 window-functions

asked Feb 13 '13 at 18:11

Just a learner

26,690
50
155
234

votes

2 answers

Kafka Stream Suppress session-windowed-aggregation

I have written this code in a Kafka stream application: KGroupedStream groupedStream = stream.groupByKey(); groupedStream.windowedBy( SessionWindows.with(Duration.ofSeconds(3)).grace(Duration.ofSeconds(3))) .aggregate(() -> {...}) …

apache-kafka apache-kafka-streams window-functions suppress

asked Jan 16 '19 at 17:44

dborn

votes

1 answer

Select random row for each group in a postgres table

I have a table that is roughly: id | category | link | caption | image My goal is to fetch a random row from each distinct category in the table, for all the categories in the table. The plan is to then assign each row to a variable for its…

sql postgresql window-functions

asked Aug 30 '13 at 18:22

Steve

votes

1 answer

How to aggregate over rolling time window with groups in Spark

I have some data that I want to group by a certain column, then aggregate a series of fields based on a rolling time window from the group. Here is some example data: df = spark.createDataFrame([Row(date='2016-01-01', group_by='group1', get_avg=5,…

sql apache-spark pyspark apache-spark-sql window-functions

asked Jan 18 '17 at 05:05

Mike S

Prev 1 2 3

…

99 100 Next