Questions tagged [window-functions]

A window function is a type of SQL operation that aggregates over a partition of the result set.

As a basic example of a window function, consider a table that contains a list persons along with their age and gender.
If you wanted a query to return the complete list of people along with their age rank per gender, you could accomplish this most simply by using a ranking window function.
Such a function can be used to rank individual rows within a subset of the results.

Using a window function the above function would look like this:

SELECT name, 
       age, 
       gender, 
       RANK() OVER (PARTITION BY gender ORDER BY age DESC) AS age_rank
FROM persons

Window functions as defined by the SQL standard can also access rows before and after the current row using the lag() or lead() function. The following query calculates the delta to the previous row for the amount column for each customer:

SELECT customer_id,
       order_date,
       amount, 
       amount - lag(amount) over (partition by customer_id ORDER BY order_date) as delta
FROM order
ORDER BY customer_id, order_date;

Standard aggregates (e.g. count(), sum()) can also be used as a window function. If combined with an ORDER BY (inside the partition defintion) this will generate "running" totals based on the aggregate being used.

Window functions are available in a wide range of DBMS: Oracle (called "analytical functions" there), Microsoft SQL Server, DB2 (called "olap functions" there) PostgreSQL, Teradata, Sybase, Vertica, and partially in H2 Database.

4082 questions
9
votes
3 answers

Spark Scala : Getting Cumulative Sum (Running Total) Using Analytical Functions

I am implementing the Cumulative Sum in Spark using Window Function. But the order of records input is not maintained while applying the window partition function Input data: val base = List(List("10", "MILLER", "1300", "2017-11-03"), List("10",…
RaAm
  • 1,072
  • 5
  • 22
  • 35
9
votes
2 answers

Conditional lead/lag function PostgreSQL?

I have a table like this: Name activity time user1 A1 12:00 user1 E3 12:01 user1 A2 12:02 user2 A1 10:05 user2 A2 10:06 user2 A3 10:07 user2 M6 10:07 user2 B1 10:08 user3 A1 …
KolM
  • 101
  • 1
  • 1
  • 2
9
votes
1 answer

Can I put a condition on a window function in Redshift?

I have an events-based table in Redshift. I want to tie all events to the FIRST event in the series, provided that event was in the N-hours preceding this event. If all I cared about was the very first row, I'd simply do: SELECT event_time …
ScottieB
  • 3,958
  • 6
  • 42
  • 60
9
votes
1 answer

ROW_Count() To Start Over Based On Order

Create Table #Test ( ID Int Primary Key Identity, Category VarChar(100) ) Insert into…
Warpnow
  • 93
  • 2
9
votes
5 answers

PostgreSQL - column value changed - select query optimization

Say we have a table: CREATE TABLE p ( id serial NOT NULL, val boolean NOT NULL, PRIMARY KEY (id) ); Populated with some rows: insert into p (val) values (true),(false),(false),(true),(true),(true),(false); ID VAL 1 1 2 0 3 0 4 …
Nailgun
  • 3,999
  • 4
  • 31
  • 46
9
votes
2 answers

Error: invalid input syntax for integer: ""

I have this table tbl_buku: id_buku judul_buku tahun_buku 1 Bioogi 2010 2 Fisika 2010 3 Informatika 2012 4 Kimia 2012 I use query like this, but I am getting an error: select…
Trisna
  • 385
  • 1
  • 5
  • 15
9
votes
1 answer

Rank function in MySQL with Order By clause

How could this (Oracle) SQL: select a.*, rank() over (partition by a.field1 order by a.field2 desc) field_rank from table_a a order by a.field1, a.field2 be translated into MySQL? This question seems to be similar but there is no Order By in the…
user1433877
  • 145
  • 2
  • 8
8
votes
4 answers

Is there an equivalent to the F# Seq.windowed in C#?

I am working on some C# code dealing with problems like moving averages, where I often need to take a List / IEnumerable and work on chunks of consecutive data. The F# Seq module has a great function, windowed, which taking in a Sequence, returns a…
Mathias
  • 15,191
  • 9
  • 60
  • 92
8
votes
2 answers

Selecting positive aggregate value and ignoring negative in Postgres SQL

I must apply a certain transformation fn(argument). Here argument is equal to value, but not when it is negative. When you get a first negative value, then you "wait" until it sums up with consecutive values and this sum becomes positive. Then you…
zamza
  • 337
  • 1
  • 2
  • 12
8
votes
2 answers

Optimizing a Vertica SQL query to do running totals

I have a table S with time series data like this: key day delta For a given key, it's possible but unlikely that days will be missing. I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of…
user879681
  • 81
  • 1
  • 3
8
votes
1 answer

Filtering on a Window function in Django

I have the following model: class Foobar(models.Model): foo = models.IntegerField() And I figured out how to calculate the delta of consecutive foo fields by using window functions: qs = Foobar.objects.annotate( delta=F('foo') - Window( …
Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
8
votes
4 answers

SQL Find pairs of rows with next best timestamp match

My challenge is to find pairs of rows that are adjacent by timestamp and keep only those pairs with minimal distance of a value field (positive values of the difference) A table measurement collects data from different sensors with a timestamp and a…
Daniel
  • 81
  • 2
8
votes
4 answers

PySpark Window function on entire data frame

Consider a PySpark data frame. I would like to summarize the entire data frame, per column, and append the result for every row. +-----+----------+-----------+ |index| col1| col2 | +-----+----------+-----------+ | …
Kenny
  • 1,902
  • 6
  • 32
  • 61
8
votes
2 answers

Snowflake: "SQL compilation error:... is not a valid group by expression"

Without resorting to CTEs or a sub-query is there any way to use Window functionality with a different summary level than the GROUP BY? COUNT(*) works, but if a column name is specified in the COUNT or the SUM function is used, the query errors…
HarryD
  • 103
  • 1
  • 2
  • 6
8
votes
1 answer

Select previous row which satisfies a condition in hive

I have product data like this Product Date Sales Availbility xyz 2017-12-31 724.5 6.0 xyz 2018-01-07 362.25 7.0 xyz 2018-01-14 281.75 7.0 xyz 2018-01-21 442.75 7.0 xyz …
av abhishiek
  • 647
  • 2
  • 11
  • 26