Calculate diff time for combinations of two columns

Question

I need to calculate the time it takes for trucks to go from one station to another. I want to select any origin and destination in my dataset and calculate the average time it took for all trucks.

This is how the data could look like this:

Truck Number	Origin	Destination	Departure	Arrival
1	A	B	01-01-2022	05-01-2022
1	B	C	10-01-2022	15-01-2022
1	C	D	16-01-2022	20-01-2022
1	D	E	20-01-2022	22-01-2022
2	A	B	15-01-2022	25-01-2022

We would need to get all the combinations from origins to destinations for the next X destination for a particular origin. It would be great if it could have a variable so we can set that we want the following X destinations. In the example it would be x=3 since we calculate the different between A and B C D but not AE:

Truck Number	Origin	Destination	Transit Time
1	A	B	5 days
1	A	C	15 days
1	A	D	20 days
1	B	C	5 days
1	B	D	10 days
1	B	E	12 days
1	C	D	4 days
1	C	E	6 days
2	A	B	10 days

Main question would be how to calculate all combinations for x stops in the future.

thanks! recursive cte + cross joins maybe? way above my sql level — fosterXO, Mar 05 '23 at 22:52
What if the truck route is not consistent, perhaps sometimes traveling A>C>B>D or A>E>C>D? Are there any return trips that would create a loop such as A>B>C>A? If history shows additional routes A>W>X>Y>Z>D, do you want to average that in or only calculate the shortest route based on averages for individual legs? The latter case would be a form of the [Shortest path problem](https://en.wikipedia.org/wiki/Shortest_path_problem), where the edges are weighted by the average time between stations. — T N, Mar 05 '23 at 22:52
the average could be calculated later. the sequence would be specified by the truck and departure date order. the main problem would be generating the X combinations: A-B, A-C, A-D. It is possible to have A-A but it would be fine since we would have different dates and we would calculate the difference between the days. how to get the second table? thats the question :) — fosterXO, Mar 05 '23 at 23:11

score 0 · Answer 1 · answered Mar 05 '23 at 19:16

select 
      origin, destination, avg(datediff(day, Departure, Arrival))
from data
GROUP BY origin, destination

Put truck number in the select and remove the aggregation if you want to keep this at the level of the trucks.

I proposed the above, as you might want to get to the average time from A->B then B->C, as components, and then A->C as rollup.

you can do that with lead/lag to look for the next origin to be equal to the prior destination...

This would not help me generating all combinations for X number of stops — fosterXO, Mar 05 '23 at 19:26

Tom Boyd · Answer 2 · 2023-03-06T00:05:20.153

I made a few assumptions. First, the routes only go in one direction, ie, A->B->C->D. Second, truck numbers are unique and don't skip numbers. Third, origins and destinations are always alphabetical.

So, step one was to create a table with all of the possible origin/destination permutations. That was done with the Cross Join and a While loop to iterate through the trucks/origins/destinations. As the same time, I calculated the transit time between each permutation.

Next would be the X number of stops. That's where the Row_Number comes in. Partitioned and ordered the way it is, you can simply select everything less than or equal to your X number of stops (@Destinations). If you set @Destinations to a number greater than what's in the ResultsTable, it will just return the whole table. If you set it to 0 or NULL, it returns nothing.

Finally, in your expected results, you have two different calculations for Transit Time. A>B is 5 days but only if you count both Departure and Arrival. B>C is 5 days but only if you don't count both. I counted both. If you only want to count one, take out the + 1 in the Transit calculation.

Fiddle

DECLARE @Destinations INT = 3
DECLARE @Truck INT = 0
DECLARE @maxTruck INT
DECLARE @ResultTable TABLE 
           (rn INT, TruckNumber INT, Origin CHAR, Destination CHAR, Transit VARCHAR(10));

SELECT @maxTruck = MAX(TruckNumber)
  FROM trips;

WHILE @Truck <= @maxTruck
   BEGIN
       INSERT INTO @ResultTable (rn, TruckNumber, Origin, Destination, Transit)
         SELECT 
           DISTINCT ROW_NUMBER() OVER (PARTITION BY t1.TruckNumber, t1.origin ORDER BY t1.origin, t2.Destination) AS rn,
                    t1.TruckNumber,
                    t1.Origin, 
                    t2.Destination, 
                    DATEDIFF(DAY, t1.departure, t2.Arrival )+ 1 AS Transit
               FROM trips t1
         CROSS JOIN trips t2
              WHERE t2.destination > t1.origin
                AND t1.TruckNumber = @Truck and t2.TruckNumber = @Truck
   SET @Truck = @Truck + 1   
   END

SELECT --rn, 
       TruckNumber,
       Origin,
       Destination,
       CONCAT(Transit, ' Days') Transit
  FROM @ResultTable
 WHERE rn <= @Destinations
 ORDER BY TruckNumber,Origin,Destination

@Destinations = 3 returns:

TruckNumber	Origin	Destination	Transit
1	A	B	5 Days
1	A	C	15 Days
1	A	D	20 Days
1	B	C	6 Days
1	B	D	11 Days
1	B	E	13 Days
1	C	D	5 Days
1	C	E	7 Days
1	D	E	3 Days
2	A	B	11 Days

ahmed · Answer 3 · 2023-03-07T00:17:10.773

Use a recursive cte as the following:

declare @x_lvl int,
        @number_of_origins int;
set @x_lvl = 3;
set @number_of_origins = 3;

with cte as
(
  select row_number() over (partition by TruckNumber order by departure) rn,
         TruckNumber, Origin, 
         Destination, departure, 
         arrival, 1 as rec_level 
  from tbl
  union all 
  select C.rn,
         T.TruckNumber, C.Origin, 
         T.destination, C.departure, 
         T.arrival, C.rec_level +1
  from tbl T join cte C
  on T.TruckNumber = C.TruckNumber and
     T.Origin = C.Destination and
     T.departure > C.departure
  where C.rec_level < 3
)
select TruckNumber, 
       Origin, 
       Destination,
       Concat(Datediff(day,departure, Arrival) + 1, ' days') Transit_Time
from cte
where rn <= @number_of_origins
order by TruckNumber, Origin, Arrival

@x_lvl is used to limit the number of retrieved destinations per origin.

@number_of_origins is to limit the number of origins per truck.

See a demo

score 0 · Answer 4 · answered Mar 06 '23 at 02:36

This may be as simple as a self join that pairs any trip with any same or later trip for the same truck. This would select all combinations of origins and destinations for each truck. Adding a limit requires the addition of a CTE to assign a windowed row number, that can then be used later to limit the hops. A date diff, group by, and average calculation will then yield the desired results.

DECLARE @Limit INT = 3

;WITH NumberedTrips AS (
    SELECT
        T.*,
        ROW_NUMBER() OVER(PARTITION BY T.TruckNumber ORDER BY T.Departure) AS LegNum
    FROM Trips T
)
SELECT
    T1.Origin, T2.Destination,
    AVG(0e0 + DATEDIFF(day, T1.Departure, T2.Arrival)) AS TransitTime
FROM NumberedTrips T1
JOIN NumberedTrips T2
    ON T2.TruckNumber = T1.TruckNumber
    AND T2.LegNum BETWEEN T1.LegNum AND T1.LegNum + @Limit - 1
GROUP BY T1.Origin, T2.Destination
ORDER BY T1.Origin, T2.Destination

Sample results:

Origin	Destination	TransitTime
A	B	7
A	C	14
A	D	19
B	C	5
B	D	10
B	E	12
C	D	4
C	E	6
D	E	2

The above is slightly different from your posted expected results, which were not consistent in the date difference calculation. Your first 3 rows added +1 to the actual date difference (inclusive day counting perhaps?), but the remaining rows did not.

See this db<>fiddle for a working demo.

Calculate diff time for combinations of two columns

4 Answers4