2

Is it possible to write an aggregate function in PostgreSQL that will calculate a delta value, by substracting the initial (last value in the column) from the current(first value in column) ? It would apply on a structure like this

rankings (userId, rank, timestamp)

And could be used like

SELECT userId, custum_agg(rank) OVER w 
FROM rankings
WINDOWS w AS (PARTITION BY userId ORDER BY timstamp desc)

returning for an userId the rank of the newest entry (by timestamp) - rank of the oldest entry (by timestamp)

Thanks!

maephisto
  • 4,952
  • 11
  • 53
  • 73

2 Answers2

2

the rank of the newest entry (by timestamp) - rank of the oldest entry (by timestamp)

There are many ways to achieve this with existing functions. You can use the existing window functions first_value() and last_value(), combined with DISTINCT or DISTINCT ON to get it without joins and subqueries:

SELECT DISTINCT ON (userid)
       userid
     , last_value(rank) OVER w  
     - first_value(rank) OVER w AS rank_delta
FROM   rankings
WINDOW w AS (PARTITION BY userid ORDER BY ts
             ROWS BETWEEN UNBOUNDED PRECEDING
             AND  UNBOUNDED FOLLOWING);

Note the custom frames for the window functions!

Or you can use basic aggregate functions in a subquery and JOIN:

SELECT userid, r2.rank - r1.rank AS rank_delta
FROM  (
  SELECT userid
       , min(ts) AS first_ts
       , max(ts) AS last_ts
   FROM  rankings
   GROUP BY 1
   ) sub
JOIN   rankings r1 USING (userid)
JOIN   rankings r2 USING (userid)
WHERE  r1.ts = first_ts
AND    r2.ts = last_ts;

Assuming unique (userid, rank), or your requirements would be ambiguous.

SQL Fiddle demo.

Shichinin no samurai

... a.k.a. "7 Samurai"
Per request in the comments, the same for only the last seven rows per userid (or as many as can be found, if there are fewer):

Again, one of many possible ways. But I believe this to be one of the shortest:

SELECT DISTINCT ON (userid)
       userid
     , first_value(rank) OVER w  
     - last_value(rank)  OVER w AS rank_delta
FROM   rankings
WINDOW w AS (PARTITION BY userid ORDER BY ts DESC
             ROWS BETWEEN CURRENT ROW AND 7 FOLLOWING)
ORDER  BY userid, ts DESC;

Note the reversed sort order. The first row is the "newest" entry. I span a frame of (max.) 7 rows and pick only the results for the newest entry with DISTINCT ON.

SQL Fiddle demo.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks Erwin! I tried to further adapt your first solution, using windows, to not using all rankings but only the newest 7 let's say. I did that by modifying to ROWS BETWEEN 7 PRECEDING AND 0 FOLLOWING, but somehow i still get all rows, not just top 7 (ordered by timestamp). Any idea why? – maephisto Feb 27 '14 at 13:07
  • @maephisto: If you adapt the frame, you get varying result per `userid`. My solution builds on identical results per `userid`. Are the "newest 7" supposed to be relative to each row or absolute for the complete table? – Erwin Brandstetter Feb 27 '14 at 13:52
  • It should be newest 7 entries for a userId. Only them should be taken into consideration, everything else older than the 7th chronological entry has no value – maephisto Feb 27 '14 at 14:10
  • @maephisto: Do all of them have 7 or more, or can there be fewer? – Erwin Brandstetter Feb 27 '14 at 15:27
  • It should be possible to have fewer than 7 – maephisto Feb 27 '14 at 15:33
  • @maephisto: I added a solution for just 7. :) – Erwin Brandstetter Feb 27 '14 at 15:54
1

You can do it with JOIN and DISTINCT ON in Postgres. The GRP query give you the last rank values for each userID so just join it with rankings on user_id and substract values.

SELECT rankings.userId, 
       rankings.rank-GRP.rank as delta,
       rankings.timestamp
FROM rankings
JOIN
(
    SELECT DISTINCT ON (userId)  userId, rank, timestamp
    FROM rankings
    ORDER BY userId, timestamp DESC
) as GRP ON rankings.userId=GRP.userId

SQLFiddle demo

valex
  • 23,966
  • 7
  • 43
  • 60