PostgreSQL: Window functions using column alias

Question

I have a following table :

                                    Table "public.activity"

   Column   |            Type             |           Modifiers                       
------------+-----------------------------+-------------------------------------------------------
 id         | integer                     | not null default nextval('activity_id_seq'::regclass)
 scheduleid | integer                     | 
 name       | text                        | 
 duedate    | timestamp without time zone | 
Indexes:
    "activity_pkey" PRIMARY KEY, btree (id)

With the following data:

 id | scheduleid |   name   |          duedate           
----+------------+----------+----------------------------
  1 |          1 | ACT1     | 2015-09-21 13:34:53.738449
  2 |          1 | ACT1     | 2015-09-20 13:35:02.770369
  3 |          1 | ACT1     | 2015-09-19 13:35:07.650204
  4 |          1 | ACT1     | 2015-09-18 13:35:11.930225
  5 |          1 | ACT1.0.0 | 2015-09-17 13:35:48.033791
  6 |          1 | ACT1.0.0 | 2015-09-16 13:35:51.55382
  7 |          2 | ACT2.0.0 | 2015-09-21 13:36:56.42534
  8 |          2 | ACT2.0.0 | 2015-09-28 13:37:21.065071
  9 |          2 | ACT2.0.0 | 2015-10-05 13:37:26.753227
 10 |          2 | ACT2.0.0 | 2015-10-12 13:37:30.656846
 11 |          2 | ACT2.0.0 | 2015-10-19 13:37:34.54473
 12 |          2 | ACT2.0.0 | 2015-10-26 13:37:38.192843
(12 rows)

For each scheduleId, we have activities created.

I need to display the latest unique activity for each schedule along with the count of activities which are clubbed under it.

Following query using Postgres window functions does its job.

WITH TOP_ACTIVITIES AS (
    SELECT DISTINCT ON (scheduleid, name)
    id, scheduleid, name, duedate,
    count(*) over(partition by scheduleid, name) as clubbedcount
    from activity ORDER BY scheduleid, name, duedate desc
)
select * from TOP_ACTIVITIES;

The result is as follows:

id | scheduleid |   name   |          duedate           | clubbedcount 
----+------------+----------+----------------------------+--------------
  1 |          1 | ACT1     | 2015-09-21 13:34:53.738449 |            4
  5 |          1 | ACT1.0.0 | 2015-09-17 13:35:48.033791 |            2
 12 |          2 | ACT2.0.0 | 2015-10-26 13:37:38.192843 |            6

So far so good :P

Now a small twist is that, we need to group activities by their rangeTag too

Eg: Todays date being 21-Sep-2015,
activities with duedate <= now() --> club under TODAY tag
activities with duedate <= now() + 7 days --> club under THIS WEEK tag
activities with duedate <= now() + 1 month --> club under THIS MONTH tag
ELSE --> club under FUTURE tag

Thus we need the 1. top shelf activity for each partition defined by rangeTag, scheduleid and name 2. Count of activities, which are collated for each partition into the top activity.

Slightly modifying my query to:

WITH TOP_ACTIVITIES AS (
     SELECT DISTINCT ON (range, scheduleid, name)
     id, scheduleid, name, duedate,

     CASE WHEN duedate < now() THEN 'TODAY'
          WHEN duedate < now() + interval '7 days' THEN 'THIS WEEK'
          WHEN duedate < now() + interval '1 month' THEN 'THIS MONTH'
          ELSE 'FUTURE' 
     END AS range,

     count(*) over(partition by scheduleid, name)


     from activity ORDER BY range, scheduleid, name,duedate desc
)
select * from TOP_ACTIVITIES ORDER BY scheduleid;

gives me the NEAR desired result, except count :P

 id | scheduleid |   name   |          duedate           |   range    | count 
----+------------+----------+----------------------------+------------+-------
  1 |          1 | ACT1     | 2015-09-21 13:34:53.738449 | TODAY      |     4
  5 |          1 | ACT1.0.0 | 2015-09-17 13:35:48.033791 | TODAY      |     2
 12 |          2 | ACT2.0.0 | 2015-10-26 13:37:38.192843 | FUTURE     |     6
 11 |          2 | ACT2.0.0 | 2015-10-19 13:37:34.54473  | THIS MONTH |     6
  8 |          2 | ACT2.0.0 | 2015-09-28 13:37:21.065071 | THIS WEEK  |     6
  7 |          2 | ACT2.0.0 | 2015-09-21 13:36:56.42534  | TODAY      |     6

I need count partitioned by "range" too.

BUT, replacing

count(*) over(partition by scheduleid, name)

with

count(*) over(partition by range, scheduleid, name)

doesn't work.

Error is

ERROR: column "range" does not exist
LINE 9: count(*) over(partition by range,scheduleid, name)

just like anywhere else, you can't reference one `select`-list term from another, or reference `select`-list entries from the `where` clause etc. You need a subquery or another CTE term. — Craig Ringer, Sep 21 '15 at 10:24

score 2 · Answer 1 · answered Sep 21 '15 at 10:28

Move count() (and DISTINCT ON) to a new query:

WITH top_activities AS (
    SELECT 
        id, scheduleid, name, duedate,
        CASE WHEN duedate < now() THEN 'TODAY'
            WHEN duedate < now() + interval '7 days' THEN 'THIS WEEK'
            WHEN duedate < now() + interval '1 month' THEN 'THIS MONTH'
            ELSE 'FUTURE'  
        END AS range
    FROM activity ORDER BY range, scheduleid, name,duedate desc
    ),
top_activities_with_count as (  
    SELECT DISTINCT ON (range, scheduleid, name)
        *, count(*) over(partition by range, scheduleid, name)
    FROM top_activities
    )
SELECT * FROM top_activities_with_count ORDER BY scheduleid;

PostgreSQL: Window functions using column alias

1 Answers1