grouping with FIRSTVALUE()/LAST_VALUE()

Question

I have transaction data that is timed. It has several job numbers. Sometimes job numbers are split into two pieces, with another job run between them. I would like to report those split jobs on two different lines with their own statistics.

I have tried a number of different window-based solutions, it seems like FIRST_VALUE() and LAST_VALUE() are my best options. I would like the first and last transaction time of a job to be available in columns, so I can then group on them and show the number of transactions.

When I use these, even though I partition by the job, the LastKit and FirstKit act as if I did a group. I am hoping to group, but with the job partitioning it.

select  FIRST_VALUE(DTIMECRE) OVER(PARTITION BY job  ORDER BY dtimecre) AS KitStart,
LAST_VALUE(DTIMECRE)  OVER(PARTITION BY job  ORDER BY dtimecre 
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS KitEnd,
count(*) as QtyKitted
from transactions
order by dtimecre

    KitStart            KitEnd                  Job    dtimecre 
                                                          SystemicLocation
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:07    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:08    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:09    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:10    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:10    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:11    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:12    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:13    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:13    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:14    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:15    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:16    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:46    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:47    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:48    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:49    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:00:49    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:06:17    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:16    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:26    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:32    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:39    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:12:45    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:38    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:45    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:50    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:13:55    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:14:00    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:14:06    3
    5/15/19 11:06:17    5/15/19 11:14:11    979309  5/15/19 11:14:11    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:35:51    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:35:51    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:35:52    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 11:36:23    3
    Lots of transactions……              
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 15:17:19    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 15:17:19    3
    5/15/19 11:00:07    5/15/19 15:17:20    978437  5/15/19 15:17:20    3

Looking at the data: the first KitStart is 11:00:07, it then transactions to 11:06:17 when the next job (979309) comes up. But it goes back to 11:00:07 when job 978437 starts again. I would like for it to be the first time in that block of kitting, therefore 11:35:51.

Same problem with LAST_VALUE. At the first transaction it is 15:17:20, which is the end of the second round of job 978437. I want it to be 11:00:49.

To summarize the output I am looking for, it will look like:

    KitStart            KitEnd              Job    QtyKitted
    5/15/19 11:00:07    5/15/19 15:17:20    978437  17
    5/15/19 11:06:17    5/15/19 11:14:11    979309  13
    5/15/19 11:35:51    5/15/19 15:17:20    978437  1007

This shows that job 978437 was started, ran 17 units, switched to job 979309, ran 13, then switched back to 978437 and ran 1007 units.

Also, this being my first SQL server post (and 2nd post ever) thanks for putting up with any nonconformities to Stackoverflow postings, I probably have a couple. Thanks!

score 1 · Answer 1 · answered May 17 '19 at 21:23

I found the answer thanks to Group consecutive rows of same value using time spans

I added another column (Ranker) to make separate groups for each non-consecutive incidence of the prod_id as time goes by.

with A as (
    select prod_id, sku, dtimecre, systemiclocation,
    prevProd_id = lag(prod_id, 1, prod_id)  over (order by dtimecre)
    from transactions
),
B as  (
    select prod_id, sku, dtimecre, systemiclocation,
    Ranker = SUM(CASE WHEN prod_id = Prevprod_id THEN 0 ELSE 1 END)
                OVER (order by dtimecre)
    FROM   A
    )

select prod_id, sku, min(dtimecre) as KitStart, max(dtimecre) as KitEnd from B
group by prod_id, sku, Ranker
order by min(dtimecre)

Produced:

prod_id KitStart                    KitEnd                       QtyKittted
978437  2019-05-15 11:00:07.0000000 2019-05-15 11:00:49.0000000 17
979309  2019-05-15 11:06:17.0000000 2019-05-15 11:14:11.0000000 13
978437  2019-05-15 11:35:51.0000000 2019-05-15 15:17:20.0000000 1007

Congrats, I just had time to start working on it. I figured a CTE would be probably be part of the solution. — PausePause, May 17 '19 at 21:24

score 0 · Answer 2 · answered May 16 '19 at 19:03

The question is kind of confusing, but it seems to me that you want to the MIN(KitStart) and MAX(KitEnd) for each Job, regardless of interruptions. If that's the case then a GROUP BY query should satisfy these requirements.

SELECT MIN(DTIMECRE) AS KitStart,
       MAX(DTIMECRE) AS KitEnd,
       job, 
       sku,  
       SystemicLocation 
FROM transactions
GROUP BY job, 
sku,   
SystemicLocation 
ORDER BY dtimecre

However, when doing this you lose the ability to actually see DTimeCRE as a standalone column. Based on your provided data, you'd probably get one row per job, which is not what want.

To work around this problem I'd suggest using common table expressions (CTE). I'm making the assumption that Job is a unique identifier and you wouldn't reuse a job number for more than one job. If not, you can modify the query yourself.

WITH MINDTimeCRE AS
(SELECT MIN(DTIMECRE) DTIMECRE,
       Job
       FROM transactions
       GROUO BY Job)
,MAXDTimeCRE AS 
(SELECT MAX(DTIMECRE) DTIMECRE,
       Job
       FROM transactions
       GROUP BY Job)
SELECT MINDTimeCRE.DTIMECRE KitStart,
       MAXDTimeCRE.DTIMECRE AS KitEnd,
       job, 
       sku,  
       DTIMECRE,
       SystemicLocation 
FROM transactions
  LEFT JOIN MINDTimeCRE 
  ON transactions.Job = MINDTimeCRE.Job
    LEFT JOIN MAXDTimeCRE 
    ON transactions.Job = MAXDTimeCRE.Job

This will treat the first two SELECT statements in the query almost as if they were tables, and after joining these CTEs to the query using LEFT JOIN you can query them as if they were columns without requiring aggregations.

I added some clarification and additional info to my question, thanks for being patient. I want to group by job, but sometimes the same job is split into two pieces with intervening jobs between, so I need to be able to group sequentially, with the same job being allowed more than once. — AGuyNamedKen, May 16 '19 at 22:12

grouping with FIRSTVALUE()/LAST_VALUE()

2 Answers2