I have two table schemas (MySQL 5.6 so no CTE), roughly looking like this:
CREATE TABLE nodes (
node_id INT PRIMARY KEY,
name VARCHAR(10)
);
CREATE TABLE edges (
edge_id INT PRIMARY KEY,
source INT,
target INT,
FOREIGN KEY (source) REFERENCES nodes(node_id),
FOREIGN KEY (target) REFERENCES nodes(node_id)
);
In our design, a logical edge between two nodes (logically n1 -> n2
) is actually represented as (n1 -> proxy node -> n2
) in the db. The reason we use two edges and a proxy node for a logical edge is so that we can store properties on the edge. Therefore, when a client queries for two nodes connected by an edge, the query is translated to query three connected nodes instead.
I have written a query to get a path with a fixed length. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with exactly 5 edges on the path." This is done without using recursion on the SQL side; I just generate a long query programmatically with the specified fixed length.
The challenge is, we want to support querying of a variable-length path. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with no fewer than 3 edges and no more than 10 edges on the path." Is this feasible without (or even with) CTE?
EDIT:
Some sample data:
-- Logical nodes are 1, 3, 5, 7, 9, 11. The rest are proxy nodes.
INSERT INTO nodes VALUES
(1, 'foo'),
(2, '_proxy_'),
(3, 'foo'),
(4, '_proxy_'),
(5, 'bar'),
(6, '_proxy_'),
(7, 'bar'),
(8, '_proxy_'),
(9, 'bar'),
(10, '_proxy_'),
(11, 'bar');
-- Connects 1 -> 2 -> ... -> 11.
INSERT INTO edges VALUES
(1, 1, 2),
(2, 2, 3),
(3, 3, 4),
(4, 4, 5),
(5, 5, 6),
(6, 6, 7),
(7, 7, 8),
(8, 8, 9),
(9, 9, 10),
(10, 10, 11);
The query can be, "select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path." Such paths include 1 -> 3 -> 5
, 1 -> 3 -> 5 -> 7
, 3 -> 5
, 3 -> 5 -> 7
, and 3 -> 5 -> 7 -> 9
. So the result set should include the IDs and names of nodes 1, 3, 5, 7, 9.