Finding cycles in a graph (not necessarily Hamiltonian or visiting all the nodes)

Question

I have graph like one in Figure 1 (the first image) and want to connect the red nodes to have cycle, but cycles do not have to be Hamiltonian like Figure 2 and Figure 3 (the last two images). The problem has much bigger search space than TSP since we can visit a node twice. Like the TSP, it is impossible to evaluate all the combinations in a large graph and I should try heuristic, but the problem is that, unlike the TSP, the length of cycles or tours is not fixed here. Because, visiting all the blue nodes is not mandatory and this cause having variable length including some of the blue nodes. How can I generate a possible "valid" combination every time for evaluation? I mean, a cycle can be {A, e, B, l, k, j, D, j, k, C, g, f, e} or {A, e, B, l, k, j, D, j, i, h , g, C, g, f, e}, but not {A, e, B, l, k, C, g, f, e} or {A, B, k, C, i, D}.

Update: The final goal is to evaluate which cycle is optimal/near optimal considering length and risk (see below). So I am not only going to minimize the length but minimizing the risk as well. This cause not being able to evaluate risk of cycle unless you know all its nodes sequence. Hope this clarifies why I can not evaluate new cycle at the middle of its generating process. We can:

generate and evaluate possible cycles one by one;
or generate all possible cycles and then do their evaluation.

Definition of the risk: Assume cycle is a ring which connects primary node (one of the red nodes) to all other red nodes. In case of failure in any part (edge) of the ring, no red nodes should be disconnected form the primary node (this is desired). However there are some edges we have to pass twice (due to not having Hamiltonian cycle which connects all the red nodes) and in case of failure in those edges, some of red nodes may be totally disconnected. So risk of cycle is summation of the length of risky edges (we have twice in our ring/tour) multiplied by number of red nodes we lose in case of cutting each risky edge.

A real example of 3D graph I am working on including 5 red nodes and 95 blue nodes is in below: enter image description here

And here is link to Excel sheet containing adjacency matrix of the above graph (the first five nodes are red and the rests are blue).

Upon a bit of reflecting the mapping approach is probably inefficient in the case where red nodes can be used twice, too much duplication of work. Thus I rewrote my answer to use a slightly different approach. Also I don't suppose you'd post a couple sample adjacency matrices/lists with the red nodes. That way I can do some testing myself out of curiosity. — Nuclearman, May 11 '13 at 20:58
The sample I have is an Excel file containing adjacency matrix of the graph I posted its figure. Is is suitable for your test that I upload it here? — Barpa, May 12 '13 at 21:51
To clarify, what exactly are you looking for? Are you looking for all tours that go through red nodes? Are you looking for the shortest tour (in terms of how many blue and/or red nodes it contains) that goes through all red nodes? Are you looking for just a tour that goes through all red nodes? — Nuclearman, May 13 '13 at 15:52
Oh and what are the red nodes for that? Or are they chosen independently from the graph? — Nuclearman, May 13 '13 at 15:54
I removed my posted code to avoid mess since you removed your previous suggestion. I also updated my post to answer your questions. — Barpa, May 13 '13 at 17:02
Thanks, the definition of risk is still somewhat weak. However, it sounds like the overall goal is simply to keep the tour as short as possible. It looks like the nodes may actually be 3D points, is this correct? The reason I ask is that heuristic solutions are easier to come by for a point set. I should have asked before as from the looks of it, you may actually be losing information that could be useful for heuristic purposes by converting it to an adjacency matrix, although the matrix itself is also very useful, as it limits the edges that need to be checked. — Nuclearman, May 13 '13 at 19:34
I'm seeing an interesting pattern, assuming the nodes are points, most of the red/blue points lie on intersections. It might be possible to write an algorithm that uses Manhattan distances. If risk is based on the distance between nodes, then it should be possible to quickly (again relative term) find the best path or at least a fairly good one in polynomial time. In any case, if that 100 node graph is actually a point set, I would appreciate the points to go with the nodes. Also feel free to delete old comments that are no longer relevant. — Nuclearman, May 13 '13 at 20:00
I added definition of the risk. As I explained in my last update, it is not possible to know risk of a cycle unless you know all consisting nodes and their sequence. How do you know which edge would be used twice in a tour before having the whole tour? — Barpa, May 13 '13 at 22:14
Regarding the data: as you guessed, the raw data contains 3D coordinate of the nodes. However, I should have permission of the owner to make it public, sorry. — Barpa, May 13 '13 at 22:31
True, you can't evaluate the full risk, but you could track the partial sum and use a priority queue instead of a stack or regular queue. For example, everything else being the same, a cycle that contains C-k-j-D-j-k, where the j-k edge is used twice, would have a lower priority than a a cycle that used C-k-j-D-j-i. In that way, if a tour is found with a risk of 3, then no tour will have a risk that is less than 3, though there may be others will the same value. This doesn't account for the the length of the path, but it should be possible to determine a cut off point of length vs risk. — Nuclearman, May 13 '13 at 22:36
It's probably more trouble than it's worth then for the raw data. In any case, I recommend you look into ways a 3D TSP (ideally with Manhattan distances) can be solved, as there'll probably be some insights. — Nuclearman, May 13 '13 at 22:40
I see your point but if you have complete cycles containing node sequences of you example, in some ways, the first sequence has more risk. That is, cycle C-k-j-D-j-k-l-B-e-A-e-f-g-C has lower risk than C-k-j-D-j-k-l-B-e-A-e-f-g-c. Do you see the point? However, your suggestion gave me inspiration that I am going to try soon. I implemented code to find near the shortest cycle using SA which gives result in seconds. I can use risk of near the shortest cycle, as an upper bound, to accept or reject partial cycles generated by your algorithm. I think this works, really thanks. — Barpa, May 14 '13 at 15:22
Aren't C-k-j-D-j-k-l-B-e-A-e-f-g-C and C-k-j-D-j-k-l-B-e-A-e-f-g-c, the same? May be an error there somewhere. In any case, the example wasn't intended to be definitive, it was merely intended to show that a priority queue could be used to find the lowest risk. Although the higher the risk of the lowest risk tour, the longer it'll take. In any case, it sounds like you have a fair idea, but thinking it needs to be a lower bound, then again I'm not following you 100%, and there are some ways risk can used as an upper bound so it might be valid. — Nuclearman, May 14 '13 at 15:43
Edit: sorry that is a mistake :). I meant C-k-j-D-j-k-l-B-e-A-e-f-g-C and C-k-j-D-j-i-h-g-f-e-A-e-B-e-f-g-c. And yes, risk of the shortest cycle is upper bound for all other new cycles we find. Because, a new cycle which has risk more than our upper bound, definitely cannot have length less then the shortest cycle. Thus, we can reject it. — Barpa, May 14 '13 at 16:19

Nuclearman · Accepted Answer · 2013-05-13T19:17:05.067

1

Upon a bit more reflection, I decided it's probably better to just rewrite my solution, as the fact that you can use red nodes twice, makes my original idea of mapping out the paths between red nodes inefficient. However, it isn't completely wasted, as the blue nodes between red nodes is important.

You can actually solve this using a modified version of BFS, as more-less a backtracking algorithm. For each unique branch the following information is stored, most of which simply allows for faster rejection at the cost of more space, only the first two items are actually required:

The full current path. (list with just the starting red node)
The remaining red nodes. (initially all red nodes)
The last red node. (initially the start red node)
The set of blue nodes since last red node. (initially empty)
The set of nodes with a count of 1. (initially empty)
The set of nodes with a count of 2. (initially empty)

The algorithm starts with a single node then expands adjacent nodes using BFS or DFS, this repeats until the result is a valid tour or is the node to be expanded is rejected. So the basic psudoish code (current path and remaining red points) looks something like below. Where rn is the set of red nodes, t is the list of valid tours, p/p2 is a path of nodes, r/r2 is a set of red nodes, v is the node to be expanded, and a is a possible node to expand to.

function PATHS2HOME(G,rn)
    create a queue Q
    create a list t
    p = empty list
    v ← rn.pop()
    r ← rn
    add v to p
    Q.enqueue((p,r))
    while Q is not empty
        p, r ← Q.dequeue()
        if r is empty and the first and last elements of p are the same:
            add p to t
        else
            v ← last element of p
            for all vertices a in G.adjacentVertices(v) do 
                if canExpand(p,a)
                    p2 ← copy(p)
                    r2 ← copy(r)
                    add a to the end of p2
                    if isRedNode(a) and a in r2
                        remove a from r2
                    Q.enqueue( (p2,r2) )
    return t

The following conditions prevent expansion of a node. May not be a complete list.

Red nodes:
- If it is in the set of nodes that have a count of 2. This is because the red node would have been used more than twice.
- If it is equal to the last red node. This prevents "odd" tours when a red node is adjacent to three other blue nodes. Thus say the red node A, was adjacent to blue nodes b, c and d. Then you would end a tour where part of the tour looks like b-A-c-A-d.
Blue nodes:
- If it is in the set of nodes that have a count of 2. This is because the red node would have been used more than twice.
- If it is in the set of blue nodes since last red node. This is because it would cause a cycle of blue nodes between red nodes.

Possible optimizations:

You could map out the paths between red nodes, use that to build something of a suffix tree, that shows red nodes that can be reached given the following path Like. The benefit here is that you avoid expanding a node if the path that expansion leads to red nodes that have already been visited twice. Thus this is only a useful check once at least 1 red node has been visited twice.
Use a parallel version of the algorithm. A single thread could be accessing the queue, and there is no interaction between elements in the queue. Though I suspect there are probably better ways. It may be possible to cut the runtime down to seconds instead of hundreds of seconds. Although that depends on the level of parallelization, and efficiency. You could also apply this to the previous algorithm. Actually the reasons for which I switched to using this algorithm are pretty much negated by
You could use a stack instead of a queue. The main benefit here is by using a depth-first approach, the size of the queue should remain fairly small.

edited May 13 '13 at 19:17

answered Apr 17 '13 at 17:16

Nuclearman

5,029
1
19
35

First, I should say many thanks for your nice answer. I think, I rather got your point, but there are more things to be clear. I was thinking about using Simulate Annealing, but what kind of neighborhood function can be used to produce a combination slightly different from previous one? In common TSP the neighborhood function usually replace some of the nodes randomly and produce new combination, but here we need to two kinds of replacement (red nodes and connecting nodes which are blue ones). So, a new produced combination may be a neighbor or whatever. Please let me know if not clear. – Barpa Apr 17 '13 at 19:34
It shouldn't be too difficult to get Simulated Annealing to work. Although how well depends largely on input. I've added a section to my answer that goes into more detail. – Nuclearman Apr 17 '13 at 20:37
I tried to implement your solutions but faced a problem. There are some cases like A-e-B-l-k-C-g-h-i-j-D-j-i-h-g-C-k-l-B-e-A that I need them to have in the possible solutions set and your approach is not able to produce all of them. I was thinking about listing all the ways we can connect a red node to another even if there is a red node in between, but that's also NP. – Barpa May 02 '13 at 17:42
It sounded like that wasn't a possible solution as it uses the red nodes twice. The fix would seem to be simply to allow any node to be used up to two times. This will probably effectively double the input size, which is rather bad for NP-Hard problems, but I don't see a way around it either. – Nuclearman May 03 '13 at 02:20
I implemented piece of code to have the mapping you suggested but its running time is long for graphs with many nodes. For instance, for a graph of 5 red nodes and 95 blue nodes it took `752.133771 secs` to find all the paths. What I tried is brute-force search which is expected to be slow, but I am thinking if there is any way to have the mapping faster. Do you know a fast algorithm to do it? Sorry for asking many questions and thanks for your help. – Barpa May 08 '13 at 17:48
You can certainly get better than brute force, but not by much. You might be able to improve things a bit by making use of one of the techniques outlined [here](http://en.wikipedia.org/wiki/Hamiltonian_path_problem#Algorithms), though they'll have to be tweaked some in order for this specific problem, behind simply looking for tours instead of paths. The issue though is that unlike TSP, a heuristic or approx approach can't be used (at least not without the possibility of it being wrong), thus you can't avoid the exponential run time, at least if P != NP. Also no worries about the questions. – Nuclearman May 08 '13 at 22:22
Thanks for proposing new solution. I tried to digest your algorithm but I think there are somethings missing or wrong, or perhaps I cannot understand it. As I traced it, `r` is always empty. Do not we need to copy `rn` initially into `r` before entering into the `while` loop? I also have doubt that this way can give us fast result, because I tried something similar to find all Hamiltonian cycles in the graph and it was not fast (but if I understand your algorithm well I will try it to know how fast it is). – Barpa May 12 '13 at 21:44
Updated code for rn. Fast is a relative term here. I'd be surprised if you didn't try a similar approach. However, the key to remember is that you aren't looking for Hamiltonian cycles, you are looking for cycles of red nodes, where red/blue nodes can be used twice. This is isn't computationally trivial, and a algorithm that finds Hamiltonian tour isn't strictly suitable. I'll update with a few things that can improve the algorithm. – Nuclearman May 13 '13 at 07:37
Yes, you are right, but I meant even when you are going to find Hamiltonian cycles which connect red nodes, as much as possible, it is computationally time consuming. This is what we should expect when trying all possible solutions in such problems, I think. – Barpa May 13 '13 at 11:52
True, but the previous approach did essentially the same thing, it just precomputed the possible paths. Then again, I suppose it did manage it without tracking so much information. The updates occurred by red nodes, and set operations are fairly quick. Though that approach is inefficient for it's own reasons, but perhaps less so than this one. Anyway, I'm going to look over the graph you sent me, I've already converted it into an adjacency list (or sorts using dictionaries and sets). I've already through of a minor optimization that can be done. I've posted a clarifying question above as well. – Nuclearman May 13 '13 at 15:50
I noticed another issue in your algorithm. The algorithm is not able to produce complete tour. I mean, for connecting tail of a cycle to its head we may have many choices, unlike TSP tours. I think we should add starting node twice in `r`. – Barpa May 13 '13 at 17:09
Good point the end condition should be expanded, I'll update the code. – Nuclearman May 13 '13 at 19:15
Thanks a lot for all your helps. Although I still have not tried your solution in practice but generally it seems logical. So I accepted the answer. – Barpa May 14 '13 at 16:23

Finding cycles in a graph (not necessarily Hamiltonian or visiting all the nodes)

1 Answers1