1

I'm trying to understand PostgreSQL (9.3) window functions a little bit better. Assume I have a simple table as:

SimpleTable
    id int,
    tservice timestamp

and wish to:

Select id, tservice , count(*) OVER (PARTITION BY id ....) as counter
from SimpleTable

where the records in SimpleTable have tservice times going back 40 years, but the count needs to be limited to only the three years before each record's tservice timestamp.

How can I produce a count for each record in SimpleTable?

Corollary question: How would the same query be changed to add a count of all records that have occurred three years prior to today's date?

Edit #1: Now I see where this question is to vague (I learned something :)). Using the below answer, I would like to get the 3yrs count and the count from the current date, something like:

                             3yrs prior   current date
1, 100, '2001-01-01 00:00:00', 0             0
2, 100, '2002-01-01 00:00:00', 1             0
3, 100, '2003-01-01 00:00:00', 2             0 
4, 100, '2004-01-01 00:00:00', 3             0
5, 100, '2005-01-01 00:00:00', 3             0              
6, 100, '2006-01-01 00:00:00', 3             0
7, 100, '2007-01-01 00:00:00', 3             0 
8, 100, '2008-01-01 00:00:00', 3             0
9, 100, '2009-01-01 00:00:00', 3             0
10, 100, '2010-01-01 00:00:00',3             0
11, 100, '2011-01-01 00:00:00',3             0
12, 100, '2012-01-01 00:00:00',3             0
13, 100, '2013-01-01 00:00:00',3             0
14, 100, '2014-01-01 00:00:00',3             1
15, 100, '2015-01-01 00:00:00',3             2
16, 100, '2016-01-01 00:00:00',3             3  (today is 2016-01-06)

Edit #2: This works to get the answer I need, but does not use a window partition. I'm thinking PostgreSQL does not implement the RANGE with intervals - which is what I think this problem requires.

     select s1.recid, s1.tservice, s1.client_recid, 
    (select count(*) from simpletable  s2 
        where (s1.tservice - s2.tservice)::INTERVAL <= interval '3 years' and
        s2.tservice < s1.tservice  and
        s2.client_recid = s1.client_recid)
from simpletable s1
order by client_recid, tservice

On a couple hundred thousand records, this takes about 10 seconds on my laptop. Is there a faster way?

Addendum Note: Using the functional approach with a cursor as outlined by Erwin reduced the execution time to 146ms. Thanks everybody for an excellent tutorial.

Alan Wayne
  • 5,122
  • 10
  • 52
  • 95
  • Regarding the COUNT of "the three years before each record's tservice" can you provide "sample data" and "expected result" please – Paul Maxwell Jan 06 '16 at 05:42

3 Answers3

1

Not really sure what you are aiming at and normally one would NOT partition by ID (assuming ID is unique per row). Typically you partition by some value that is shared by multiple rows. An example may help: SQL Fiddle

PostgreSQL 9.3 Schema Setup:

CREATE TABLE SimpleTable
    ("id" int, "client_id" int, "tservice" timestamp)
;

INSERT INTO SimpleTable
    ("id", "client_id", "tservice")
VALUES
    (1, 100, '2001-01-01 00:00:00'),
    (2, 100, '2002-01-01 00:00:00'),
    (3, 100, '2003-01-01 00:00:00'),
    (4, 100, '2004-01-01 00:00:00'),
    (5, 100, '2005-01-01 00:00:00'),
    (6, 100, '2006-01-01 00:00:00'),
    (7, 100, '2007-01-01 00:00:00'),
    (8, 100, '2008-01-01 00:00:00'),
    (9, 100, '2009-01-01 00:00:00'),
    (10, 100, '2010-01-01 00:00:00'),
    (11, 100, '2011-01-01 00:00:00'),
    (12, 100, '2012-01-01 00:00:00'),
    (13, 100, '2013-01-01 00:00:00'),
    (14, 100, '2014-01-01 00:00:00'),
    (15, 100, '2015-01-01 00:00:00'),
    (16, 100, '2016-01-01 00:00:00')
;

Query 1:

SELECT
      id
    , tservice
    , COUNT(*) OVER (PARTITION BY client_id) AS C1
    , COUNT(CASE WHEN tservice >= (CURRENT_DATE - INTERVAL '3 years') THEN 1 ELSE NULL END)
          OVER (PARTITION BY client_id) AS C3

FROM SimpleTable

Results:

| id |                  tservice | c1 | c3 |
|----|---------------------------|----|----|
|  1 | January, 01 2001 00:00:00 | 16 |  3 |
|  2 | January, 01 2002 00:00:00 | 16 |  3 |
|  3 | January, 01 2003 00:00:00 | 16 |  3 |
|  4 | January, 01 2004 00:00:00 | 16 |  3 |
|  5 | January, 01 2005 00:00:00 | 16 |  3 |
|  6 | January, 01 2006 00:00:00 | 16 |  3 |
|  7 | January, 01 2007 00:00:00 | 16 |  3 |
|  8 | January, 01 2008 00:00:00 | 16 |  3 |
|  9 | January, 01 2009 00:00:00 | 16 |  3 |
| 10 | January, 01 2010 00:00:00 | 16 |  3 |
| 11 | January, 01 2011 00:00:00 | 16 |  3 |
| 12 | January, 01 2012 00:00:00 | 16 |  3 |
| 13 | January, 01 2013 00:00:00 | 16 |  3 |
| 14 | January, 01 2014 00:00:00 | 16 |  3 |
| 15 | January, 01 2015 00:00:00 | 16 |  3 |
| 16 | January, 01 2016 00:00:00 | 16 |  3 |
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
1

What you have in mind is simply not possibly using the frame definition of a window function. (You were beginning to suspect as much.) The RANGE or ROWS clauses count distinct values or rows and have no concept of the meaning of values.

You want to count all rows that fall in a certain period of time and need to go about this differently. You could run a correlated subquery or a LATERAL subquery to count for each and every row, but that's expensive.

A smarter way would be to run through two cursors in parallel and keep a running count. I implemented exactly that for a very similar question:

Scales much better. I added detailed benchmarks over there.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

Here is "a way", mentioned above, using a LATERAL, to get the dynamic count that wasn't achieved using a windowed function.

SQL Fiddle

PostgreSQL 9.3 Schema Setup:

CREATE TABLE SimpleTable
    ("id" int, "client_id" int, "tservice" timestamp)
;

INSERT INTO SimpleTable
    ("id", "client_id", "tservice")
VALUES
    (1, 100, '2001-01-01 00:00:00'),
    (2, 100, '2002-01-01 00:00:00'),
    (3, 100, '2003-01-01 00:00:00'),
    (4, 100, '2004-01-01 00:00:00'),
    (5, 100, '2005-01-01 00:00:00'),
    (6, 100, '2006-01-01 00:00:00'),
    (7, 100, '2007-01-01 00:00:00'),
    (8, 100, '2008-01-01 00:00:00'),
    (9, 100, '2009-01-01 00:00:00'),
    (10, 100, '2010-01-01 00:00:00'),
    (11, 100, '2011-01-01 00:00:00'),
    (12, 100, '2012-01-01 00:00:00'),
    (13, 100, '2013-01-01 00:00:00'),
    (14, 100, '2014-01-01 00:00:00'),
    (15, 100, '2015-01-01 00:00:00'),
    (16, 100, '2016-01-01 00:00:00')
;

Query 1:

select
*
from SimpleTable
cross join lateral (
                    select count(*) as countLT3yrs
                    from SimpleTable st 
                    where st.client_id = SimpleTable.client_id
                    and st.tservice >= (SimpleTable.tservice - INTERVAL '3 years')
                    and st.tservice < SimpleTable.tservice
                    ) x

Results:

| id | client_id |                  tservice | countlt3yrs |
|----|-----------|---------------------------|-------------|
|  1 |       100 | January, 01 2001 00:00:00 |           0 |
|  2 |       100 | January, 01 2002 00:00:00 |           1 |
|  3 |       100 | January, 01 2003 00:00:00 |           2 |
|  4 |       100 | January, 01 2004 00:00:00 |           3 |
|  5 |       100 | January, 01 2005 00:00:00 |           3 |
|  6 |       100 | January, 01 2006 00:00:00 |           3 |
|  7 |       100 | January, 01 2007 00:00:00 |           3 |
|  8 |       100 | January, 01 2008 00:00:00 |           3 |
|  9 |       100 | January, 01 2009 00:00:00 |           3 |
| 10 |       100 | January, 01 2010 00:00:00 |           3 |
| 11 |       100 | January, 01 2011 00:00:00 |           3 |
| 12 |       100 | January, 01 2012 00:00:00 |           3 |
| 13 |       100 | January, 01 2013 00:00:00 |           3 |
| 14 |       100 | January, 01 2014 00:00:00 |           3 |
| 15 |       100 | January, 01 2015 00:00:00 |           3 |
| 16 |       100 | January, 01 2016 00:00:00 |           3 |
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51