SELECT query that uses data from another row

Question

I have a table issue with following structure:

+----+---------+-------------+------------+
| id | project | new_status  | updated_at |
+----+---------+-------------+------------+
| 1  | 1       | New         | 12:41:18   |
| 1  | 1       | In progress | 12:47:43   |
| 1  | 1       | Resolved    | 17:05:29   |
+----+---------+-------------+------------+

I need to implement a query that returns time that every issue of particular project spent in every status, something like that:

+----+---------+-------------+------------+
| id | project | new_status  | time_diff  |
+----+---------+-------------+------------+
| 1  | 1       | New         | 00:06:25   |
| 1  | 1       | In progress | 04:17:46   |
+----+---------+-------------+------------+

How can I get this? Preferably no special-concrete-db-features should be used, i.e. only pure SQL. But if it matters - I'm using PostgreSQL.

As always, your Postgres version and a table definition (what you get with `\d issue` in psql) should be in the question. Sample data is useful, but we can't see data types and constraints. And you did not define how to deal with project that are not yet resolved. — Erwin Brandstetter, Sep 30 '15 at 02:15

score 2 · Answer 1 · edited May 23 '17 at 10:27

I write this query on the fly so it is not tested:

SELECT id, project, new_status, (updated_at - nextUpdate) AS time_diff
                           --or CAST((updated_at - nextUpdate) AS time) AS time_diff
FROM (
    SELECT *, 
        LEAD(updated_at) OVER (PARTITION BY project ORDER BY updated_at) AS nextUpdate
    FROM yourTable) dt
WHERE nextUpdate IS NOT NULL;

A related answer is this.

score 2 · Answer 2 · answered Sep 30 '15 at 02:29

Assuming the current time from your current time zone if the next step is missing:

SELECT *
FROM  (
   SELECT *, lead(updated_at, 1, now()::time(0)) OVER (PARTITION BY id, project
                                                       ORDER BY updated_at)
           - updated_at AS time_diff
   FROM   issue
   ) sub
WHERE  new_status <> 'Resolved'  -- hide 'Resolved' row
ORDER  BY updated_at;

You example shows time values, which is typically a problematic choice. (What if events span multiple days?) Consider timestamp or timestamptz instead.

The window functions lead() and lag() can provide default values if there is no next or previous row. I use the current time without fractional seconds.

score 1 · Answer 3 · answered Sep 29 '15 at 16:55

Probably a simple second table with project id, step name, and a start time and a stop time. Some way of ordering the steps, to, such as a simple sequence of numbers. Then run a join between the two tables on project id, ordered by step id, with the start time of the step subtracted from the end time. Use a case statement to show the current step has no end time.o

score 1 · Accepted Answer · answered Sep 30 '15 at 07:04

Since you already have some examples of doing it with lead() functions (which are definitely database specific), here's a different option: http://sqlfiddle.com/#!15/497de/18

with t1 (id, project, new_status, updated_at, ndx) as (
  select id, project, new_status, updated_at,
  row_number() over (partition by id, project order by updated_at)
  from issue
)
,
t2 (id, project, new_status, starttime, endtime) as (
  select t1.id, t1.project, t1.new_status, t1.updated_at, t2.updated_at
  from t1
  left join t1 t2
  on t2.id = t1.id
  and t2.project = t1.project
  and t2.ndx = t1.ndx + 1
)
,
t3 (id, project, new_status, time_diff) as (
  select id, project, new_status, endtime - starttime
  from t2
)
select id, project, new_status, to_char(time_diff, 'HH24:MI:SS') as time_diff
from t3
where time_diff is not null

This option uses common table expressions to create an index using row_number() for each of your projects, and then left joins the table to itself based on that index; that is t2.ndx = t1.ndx + 1.

From there, it's a matter of calculating the difference in time and formatting it for display.

If you'd like to see how much time has passed with the issue at 'Resolved' status, then use something like coalesce(t2.updated_at,localtime) to get the current time if t2.updated_at is null.

The window function `lead()` is standard SQL. Of the big RDMS only MySQL doesn't support it - which doesn't keep up with development, it wouldn't support the CTEs in your query either. — Erwin Brandstetter, Sep 30 '15 at 14:17

SELECT query that uses data from another row

4 Answers4