1

I have a table of transactions in an Oracle database. I am attempting to pull a report together for a delivery system involving a number of transaction types. The "request" type can actually be one of four sub-types ('A', 'B', 'C', and 'D' for this example), and the "delivery" type can be one of four different sub-types ('PULL', 'PICKUP', 'MAIL'). There can be anywhere from 1 to 5 transactions to get an item from "request" to "delivery, and a number of the "delivery" types are also intermediary transactions. Example:

Item | Transaction | Timestamp
001  | REQ-A       | 2014-07-31T09:51:32Z
002  | REQ-B       | 2014-07-31T09:55:53Z
003  | REQ-C       | 2014-07-31T10:01:15Z
004  | REQ-D       | 2014-07-31T10:02:29Z
005  | REQ-A       | 2014-07-31T10:05:47Z
002  | PULL        | 2014-07-31T10:20:04Z
002  | MAIL        | 2014-07-31T10:20:06Z
001  | PULL        | 2014-07-31T10:22:21Z
001  | TRANSFER    | 2014-07-31T10:22:23Z
003  | PULL        | 2014-07-31T10:24:10Z
003  | TRANSFER    | 2014-07-31T10:24:12Z
004  | PULL        | 2014-07-31T10:26:28Z
005  | PULL        | 2014-07-31T10:28:42Z
005  | TRANSFER    | 2014-07-31T10:28:44Z
001  | ARRIVE      | 2014-07-31T11:45:01Z
001  | PICKUP      | 2014-07-31T11:45:02Z
003  | ARRIVE      | 2014-07-31T11:47:44Z
003  | PICKUP      | 2014-07-31T11:47:45Z
005  | ARRIVE      | 2014-07-31T11:49:45Z
005  | PICKUP      | 2014-07-31T11:49:46Z

What I need is a report like:

Item | Start Tx | End Tx | Time
001  | REQ-A    | PICKUP | 1:53:30
002  | REQ-B    | MAIL   | 0:24:13
003  | REQ-C    | PICKUP | 1:46:30
004  | REQ-D    | PULL   | 0:23:59
005  | REQ-A    | PICKUP | 1:43:59

What I have:

Item | Start Tx | End Tx   | Time
001  | REQ-A    | PULL     | 0:30:49
001  | REQ-A    | TRANSFER | 0:30:51
001  | REQ-A    | ARRIVE   | 1:53:29
001  | REQ-A    | PICKUP   | 1:53:30
002  | REQ-B    | PULL     | 0:24:11
002  | REQ-B    | MAIL     | 0:24:13
003  | REQ-C    | PULL     | 0:22:55
003  | REQ-C    | TRANSFER | 0:22:57
003  | REQ-C    | ARRIVE   | 1:46:29
003  | REQ-C    | PICKUP   | 1:46:30
004  | REQ-D    | PULL     | 0:23:59
005  | REQ-A    | PULL     | 0:22:55
005  | REQ-A    | TRANSFER | 0:22:57
005  | REQ-A    | ARRIVE   | 1:43:58
005  | REQ-A    | PICKUP   | 1:43:59

What I'm doing to get that data:

SELECT Item, Transaction, nextTransaction, nextTimestamp - Timestamp
FROM (
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 5) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 5) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 4) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 4) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 3) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 3) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 2) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 2) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 1) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 1) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
)
WHERE nextTransaction IS NOT NULL
AND Transaction IN ('REQ-A', 'REQ-B', 'REQ-C', 'REQ-D')

I could manually parse this in a script (and perhaps that's actually the best course of action), but for the sake of learning, I'd like to know if it's possible to actually do this with SQL alone.

ND Geek
  • 398
  • 6
  • 21

2 Answers2

4

It sounds like you want the first and last transactions based on time. I think the following does what you want:

select item,
       min(transaction) keep (dense_rank first order by timestamp) as StartTx, 
       min(transaction) keep (dense_rank last order by timestamp) as EndTx,
       max(timestamp) - min(timestamp)
from transactions t
group by item;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I always forget about `keep` . Much simpler. – Alex Poole Jul 31 '14 at 16:34
  • I apparently left out some rather important information...this is loan data, rather than one time transaction/sale data. That means a given item may show up 5 or 6 times in a given time period. Otherwise, this is a very elegant solution. I wasn't familiar with either KEEP or DENSE_RANK, so knowing those will definitely come in handy. – ND Geek Jul 31 '14 at 17:57
  • 1
    @NDGeek . . . I might suggest that you ask another question rather than editing this one. You already have several answers, and changing a question after it has answers is impolite, especially if it invalidates the answers. – Gordon Linoff Jul 31 '14 at 18:04
  • 1
    Apologies, I've edited away my edit, and will accept your answer as it best fits my original question. – ND Geek Jul 31 '14 at 18:25
  • I've posted a nearly-duplicate but importantly distinct question [over here](http://stackoverflow.com/q/25067292/1463788). – ND Geek Jul 31 '14 at 19:29
  • @NDGeek . . . I'm confused. The above query works when the items appear multiple times. What is the problem that you are having with it? – Gordon Linoff Jul 31 '14 at 19:53
1

You could use the first_value analytic function instead of lead:

select item, start_tran, end_tran, end_time - start_time
from (
  select item,
    first_value(transaction) over (partition by item
      order by timestamp) as start_tran,
    first_value(timestamp) over (partition by item
      order by timestamp) as start_time,
    first_value(transaction) over (partition by item
       order by timestamp desc) as end_tran,
    first_value(timestamp) over (partition by item
       order by timestamp desc) as end_time,
    row_number() over (partition by item
       order by timestamp) as rn
  from transactions
)
where rn = 1
order by item;

      ITEM START_TRAN END_TRAN   END_TIME-START_TIME
---------- ---------- ---------- -------------------
         1 REQ-A      PICKUP     0 1:53:30.0         
         2 REQ-B      MAIL       0 0:24:13.0         
         3 REQ-C      PICKUP     0 1:46:30.0         
         4 REQ-D      PULL       0 0:23:59.0         
         5 REQ-A      PICKUP     0 1:43:59.0         

The row_number is instead of a distinct. The inner query generates one row for each of the rows in your original table, with just the item and the analytic function results, so they are all the same for each item; item 1 has five identical rows, each showing the first and last transaction and corresponding timestamp. The outer query mostly collapses those, but also does the timestamp subtraction to get the elapsed interval.

SQL Fiddle doesn't show intervals very well, but you could extract the hours/minutes/seconds values from that if you wanted to present it differently. Or if the column is actually a date then you can just use to_char() of course.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318