2

I have a table in a SQL Server 2008 R2 database that defines transitions between different states.

Example relevant table columns:

TransitionID | SourceStateID | DestinationStateID | WorkflowID
--------------------------------------------------------------
1            | 17            | 18                 | 3
2            | 17            | 21                 | 3
3            | 17            | 24                 | 3
4            | 17            | 172                | 3
5            | 18            | 17                 | 3
6            | 18            | 24                 | 3
7            | 18            | 31                 | 3
8            | 21            | 17                 | 3
9            | 21            | 20                 | 3
10           | 21            | 141                | 3
11           | 21            | 184                | 3

... etc.

My goal is to be able to define a starting StateID, an ending StateID, and the WorkflowID, and hopefully find a logical path from the starting StateID to the ending StateID within that workflow.

e.g. For the table above, if I supplied a starting StateID of 17, and ending StateID of 184, and a WorkflowID of 3, I could wind up with a 'path' of 17>21>184 (or more ideally, TransitionID 2 > TransitionID 11)

The Good:

  • The Starting and Ending StateID's defined will always have a possible path within the defined WorkflowID

The Bad:

  • There are certainly circular references on virtually every source/destination StateID (i.e. there's likely a transition from SourceStateID 1 to DestinationStateID 2, and also from SourceStateID 2 to DestinationStateID 1

  • There are certainly multiple possible paths from virtually any defined starting StateID and ending StateID

I realize it's some manner of CTE I'm after, but admittedly I find CTE's hard to grasp in general, and doubly so when circular references are a guaranteed problem.

The perfect solution would be one that selects the shortest path from the starting StateID to the ending StateID, but to be honest at this point if I can get something working that will reliably give me any valid path between the two states I'll be so happy.

Any SQL gurus out there have some direction you could point me in? I honestly don't even know where to begin to prevent circular issues like getting a path along the lines of 17>18>17>18>17>18...

DISGUSTING UPDATE I came up with this query which hurts me on an emotional level (with some help from this post: https://stackoverflow.com/a/11042012/3253311), but appears to be working.

DECLARE @sourceStatusId INT = 17
DECLARE @destinationStatusId INT = 24
DECLARE @workflowId INT = 3

DECLARE @sourceStatusIdString NVARCHAR(MAX) = CAST(@sourceStatusId AS             NVARCHAR(MAX))
DECLARE @destinationStatusIdString NVARCHAR(MAX) = CAST(@destinationStatusId     AS NVARCHAR(MAX))
DECLARE @workflowIdString NVARCHAR(MAX) = CAST(@workflowId AS NVARCHAR(MAX))

;WITH CTE ([Source], [Destination], [Sentinel]) AS 
(
SELECT
    CAST(t.[Source] AS NVARCHAR(MAX)) AS [Source],
    CAST(t.[Destination] AS NVARCHAR(MAX)) AS [Destination],
    [Sentinel] = CAST([Source] AS NVARCHAR(MAX))
FROM
    Transitions t
WHERE
    CAST([Source] AS NVARCHAR(MAX)) = @sourceStatusIdString AND 
    CAST([WorkflowID] AS NVARCHAR(MAX)) = @workflowIdString

UNION ALL

SELECT
    CAST(CTE.[Destination] AS NVARCHAR(MAX)),
    CAST(t.[Destination] AS NVARCHAR(MAX)),
    CAST([Sentinel] AS NVARCHAR(MAX)) + CAST('|' AS NVARCHAR(MAX)) + CAST(CTE.[Destination] AS NVARCHAR(MAX))
FROM
    CTE
    JOIN Transitions t
        ON CAST(t.[Source] AS NVARCHAR(MAX)) = CAST(CTE.[Destination] AS     NVARCHAR(MAX))
WHERE
    CHARINDEX(CTE.[Destination], Sentinel)=0
)

SELECT TOP 1
[Sentinel]
FROM
CTE
WHERE
LEFT([Sentinel], LEN(@sourceStatusIdString)) = @sourceStatusIdString AND 
RIGHT([Sentinel], LEN(@destinationStatusIdString)) =     @destinationStatusIdString 
ORDER BY
LEN([Sentinel])
Community
  • 1
  • 1
McFixit
  • 437
  • 3
  • 15
  • There is no amount of sql or even logic that can solve this. You have one row with parentID 17 and a child of 18, in the same group you have a parentID of 18 and a child of 17. You don't even have any data in here which indicates what the base or starting point is. You have an unsolvable riddle here. Think of it like this. The chicken is 17 and the egg is 18. Which one came first? – Sean Lange Apr 15 '16 at 20:40
  • 1
    @SeanLange While I agree that it's a pickle, I respectfully disagree that it "can't be solved". I've just come up with a solution that _works_, however its implementation hurts me on an emotional level. I'm going to check out the answer by Quassnoi now and hopefully I won't need to use what I came up with :) – McFixit Apr 15 '16 at 20:58

3 Answers3

3

Similar to Quassnoi's answer, but prevents circular references that start after the first element:

DECLARE @src int = 18, @dst int = 184;
WITH cte  AS 
(Select TransitionId, SourceStateId, DestinationStateID, SourceStateID as StartingState, 0 as Depth
  , cast(TransitionId as varchar(max)) + ','  as IDPath  
  , cast(SourceStateID as varchar(max)) + ',' as States
 FROM Transitions
 WHERE SourceStateId = @src
 UNION ALL
 Select t.TransitionId, t.SourceStateId, t.DestinationStateID, StartingState, cte.Depth + 1
  , cte.IDPath + cast(t.TransitionId as varchar(max)) + ',' 
  , cte.States + cast(t.SourceStateID as varchar(max)) + ','
 FROM cte JOIN Transitions t on cte.DestinationStateID = t.SourceStateId 
 and CHARINDEX(',' + cast(t.SourceStateID as varchar(max)) + ',', States) = 0 --prevent loop starting after first element
 and t.DestinationStateID <> StartingState --prevent loop starting with first element
 where cte.Depth < 10 -- prevent going too deep
 )
 select TransitionId, SourceStateId, DestinationStateID, Depth, left(IDPath, len(IDPath) - 1) IDPath, left(States, len(States) - 1) States
 from cte
 where DestinationStateID = @dst
 order by Depth
Kateract
  • 822
  • 6
  • 15
  • 1
    Yours seems to run the fastest out of the working answers. I wound up going with something slightly different, but this fits the bill..thanks! – McFixit Apr 18 '16 at 14:10
2
WITH    q (id, src, dst, sid, cnt, path) AS
        (
        SELECT  transitionId, sourceStateId, destinationStateId, sourceStateId, 1,
                CAST(transitionId AS VARCHAR(MAX)) path
        FROM    mytable
        WHERE   sourceStateId = 18
        UNION ALL
        SELECT  m.transitionId, m.sourceStateId, m.destinationStateId,
                CASE WHEN sid < sourceStateId THEN sid ELSE sourceStateId END, cnt + 1,
                path + ', ' + CAST(transitionId AS VARCHAR(MAX))
        FROM    q
        JOIN    mytable m
        ON      m.sourceStateId = q.dst
                AND m.sourceStateId <> q.sid
        )
SELECT  TOP 1
        *
FROM    q
WHERE   dst = 184
ORDER BY
        cnt DESC

See the fiddle: http://www.sqlfiddle.com/#!6/9342e/17

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
0

Here's my take on it... Building on Quassnoi's, but shorter than Kateract ;-)

http://www.sqlfiddle.com/#!6/322d3/4

WITH data AS (
    SELECT  TransitionId, SourceStateId, DestinationStateId,
        '|' + CAST(transitionId AS VARCHAR(MAX)) as Path
    FROM    States
    WHERE   sourceStateId = 17
    AND WorkflowID = 3
    UNION ALL
    SELECT  m.TransitionId, m.sourceStateId, m.DestinationStateId,
        Path + '|' + CAST(m.TransitionId AS VARCHAR(MAX))
    FROM data d
        JOIN States m ON
            m.sourceStateId = d.DestinationStateId
        AND CHARINDEX( '|' + convert(varchar(10),m.TransitionID) + '|', path) = 0
        AND WorkflowID = 3
)
SELECT TOP 1 *
FROM data
WHERE DestinationStateId = 184
ORDER BY LEN(Path)
-- If you want the longest path... ORDER BY LEN(Path) DESC
Spock
  • 4,700
  • 2
  • 16
  • 21