1

I'm sorry if this question has been asked here before, but I can't seem to find it. I keep finding how to sum per hour, but my question is regarding SUM and COUNT between timestamps that are defined in another column.

I have one table called incoming_orders: It shows the intended destination, and the timestamp of the incoming order.

I have a second table called scheduled_output: It shows each scheduled output moment for each destination.

I have third table called outgoing_orders: It shows how the actual destination, and the timestamp of the outgoing order.

So, the data could be:

--Incoming_orders:
Destination  Timestamp
ROUTE B      14/03/2018 7:48:00 
ROUTE A      14/03/2018 7:58:00
ROUTE A      14/03/2018 12:48:00
ROUTE C      14/03/2018 13:28:00

--Scheduled_Output
ROUTE A      14/03/2018 8:00:00
ROUTE A      14/03/2018 11:00:00
ROUTE A      14/03/2018 12:00:00
ROUTE A      14/03/2018 17:00:00    
ROUTE B      14/03/2018 8:00:00
ROUTE B      14/03/2018 10:00:00
ROUTE B      14/03/2018 12:00:00
ROUTE C      14/03/2018 07:00:00 
ROUTE C      14/03/2018 14:00:00 
ROUTE C      14/03/2018 17:00:00 

--Which would lead to the following outgoing_orders:
ROUTE A      14/03/2018 8:00:00
ROUTE B      14/03/2018 8:00:00 
ROUTE C      14/03/2018 14:00:00
ROUTE A      14/03/2018 17:00:00

Now, I want to check that the incoming order of 07:58 to route A actually made it into the output cycle of 08:00 for route A. I was thinking of creating a table like this to show it:

Destination output moment   expected_output actual_output   diff
Route A     8:00            1               1               0
Route A     11:00           0               0               0
Route A     12:00           0               0               0
Route A     17:00           1               1               0

But the question is: How do I calculate that expected_output column? How do I group the incoming order to Route A of 12:48 to the 12:00-17:00 group? It should count all the orders between scheduled output moments, but I'm not sure how to accomplish that.

Could I perhaps CEIL, FLOOR or ROUND to the closest scheduled_output value? Or can I somehow do a BETWEEN row n and n+1 with a rowcount? Or is there another, easier way?

Ronald
  • 172
  • 2
  • 11
  • Looking at your scheduled output, your times aren't always the same. Is there any consistent pattern to those times, or can they be anything? If there is some kind of pattern it would make things easier. – paul41 Mar 14 '18 at 15:30
  • Sadly, there is no pattern there; It allows both "every n minutes" as well as "specific output moments". It varies for each route. – Ronald Mar 14 '18 at 15:34
  • I'm thinking you may be able to do something with the date difference like `select scheduled_date - incoming_date from dual;`, but the part I can't figure out is with not knowing what the scheduled ones will be. – paul41 Mar 14 '18 at 15:47

1 Answers1

2

I think it's easiest to determine the previous time of scheduled output, obtaining time intervals, more or less in this way:

SELECT destination,
       time_stamp,
       ( SELECT max( time_stamp ) 
         FROM SCHEDULED_OUTPUT t1
         WHERE t.destination = t1.destination
           AND t1.time_stamp < t.time_stamp
        ) as previous_time_stamp
FROM SCHEDULED_OUTPUT t
order by 1,2

or in a more compact form using analytic funtion:

SELECT destination,
       time_stamp,
       lag( time_stamp ) over (partition by destination order by time_stamp )
       as previous_time_stamp
FROM SCHEDULED_OUTPUT t
order by 1,2

demo: http://sqlfiddle.com/#!4/c7bc9/1

| DESTINATION |            TIME_STAMP |   PREVIOUS_TIME_STAMP |
|-------------|-----------------------|-----------------------|
|     ROUTE A | 2018-03-14 08:00:00.0 |                (null) |
|     ROUTE A | 2018-03-14 11:00:00.0 | 2018-03-14 08:00:00.0 |
|     ROUTE A | 2018-03-14 12:00:00.0 | 2018-03-14 11:00:00.0 |
|     ROUTE A | 2018-03-14 17:00:00.0 | 2018-03-14 12:00:00.0 |
|     ROUTE B | 2018-03-14 08:00:00.0 |                (null) |
|     ROUTE B | 2018-03-14 10:00:00.0 | 2018-03-14 08:00:00.0 |
|     ROUTE B | 2018-03-14 12:00:00.0 | 2018-03-14 10:00:00.0 |
|     ROUTE C | 2018-03-14 07:00:00.0 |                (null) |
|     ROUTE C | 2018-03-14 14:00:00.0 | 2018-03-14 07:00:00.0 |
|     ROUTE C | 2018-03-14 17:00:00.0 | 2018-03-14 14:00:00.0 |

and next the above resultset can be joined to INCOMING_ORDERS in order to calculate counts:

SELECT x.destination, x.time_stamp as output_moment,
       count( y.DESTINATION ) as expected_output 
FROM (
   SELECT destination,
          time_stamp,
          lag( time_stamp ) over (partition by destination order by time_stamp )
          as previous_time_stamp
   FROM SCHEDULED_OUTPUT t
) x
LEFT JOIN INCOMING_ORDERS y
ON x.DESTINATION =  y.DESTINATION
AND y.TIME_STAMP <= x.TIME_STAMP
AND ( y.TIME_STAMP > x.previous_time_stamp OR x.previous_time_stamp IS NULL )
GROUP BY x.destination, x.time_stamp
ORDER BY 1,2

Demo: http://sqlfiddle.com/#!4/c3958/2

| DESTINATION |         OUTPUT_MOMENT | EXPECTED_OUTPUT |
|-------------|-----------------------|-----------------|
|     ROUTE A | 2018-03-14 08:00:00.0 |               1 |
|     ROUTE A | 2018-03-14 11:00:00.0 |               0 |
|     ROUTE A | 2018-03-14 12:00:00.0 |               0 |
|     ROUTE A | 2018-03-14 17:00:00.0 |               1 |
|     ROUTE B | 2018-03-14 08:00:00.0 |               1 |
|     ROUTE B | 2018-03-14 10:00:00.0 |               0 |
|     ROUTE B | 2018-03-14 12:00:00.0 |               0 |
|     ROUTE C | 2018-03-14 07:00:00.0 |               0 |
|     ROUTE C | 2018-03-14 14:00:00.0 |               1 |
|     ROUTE C | 2018-03-14 17:00:00.0 |               0 |

This condition:

AND y.TIME_STAMP <= x.TIME_STAMP
AND ( y.TIME_STAMP > x.previous_time_stamp OR x.previous_time_stamp IS NULL )

tells that if an order is placed at, say 8:00:00 and the route starts at the same time 8:00:00, then this order is still assigned to this "starting" route. If this is not possible (that is - the order must be assigned to the next route when it is placed at the exact time when the route starts), then change the condition to:

AND y.TIME_STAMP < x.TIME_STAMP
AND ( y.TIME_STAMP >= x.previous_time_stamp OR x.previous_time_stamp IS NULL )
krokodilko
  • 35,300
  • 7
  • 55
  • 79