Depth First Search in MySQL

Question

I'm trying to write a MySQL PROCEDURE which takes an edge e and an edge set eset as inputs and outputs a boolean value iscyclic to determine whether the additional edge results in a cyclic graph. Would there be any more straightforward way to do this other than creating a table of all the vertices with a column for something like "visit count" and then checking if any vertex is visited more than once while running through the edge set?

First reaction: why would you want to use SQL to do something like that? Is there any chance that you can do this in another part of your system? — Apiwat Chantawibul, Nov 23 '12 at 21:19
Unfortunately no, it was specified that this needs to be done in SQL. — Mike, Nov 23 '12 at 22:01
If given that a graph is `connected` then the following statement holds: The graph is `acyclic` (is a tree) if and only if the `number of nodes - 1 = number of edges`. — Apiwat Chantawibul, Nov 23 '12 at 22:38
Is your `eset` guaranteed to be a tree? If so, you simply need to check that edge `e` has 1 end connected to `eset` and another end **not** connected to `eset` to make sure that `e union eset` is also a tree. — Apiwat Chantawibul, Nov 23 '12 at 22:42
The fact about acyclic trees was something I didn't realize. eset couldn't be guaranteed to be connected because it is being populated as an MST is being constructed (by Kruskal's algorithm) by another function in which is_cyclic() is called. The first time the is_cyclic() function is called, the eset would be empty. From there on, it would either have an edge inserted or discarded at every iteration, until all the possible edges are considered. — Mike, Nov 23 '12 at 22:48
Perhaps, if you could keep track of connected group of nodes while MST is being constructed, you'll be able to (more easily) answer whether a given `eset` is connected or not. — Apiwat Chantawibul, Nov 23 '12 at 22:55
In fact, the standard way to do Kruskal need that datastructure to keep track of connected groups. see the word "disjoint-set data structure" on http://en.wikipedia.org/wiki/Kruskal's_algorithm — Apiwat Chantawibul, Nov 23 '12 at 23:01
Right, I'm not sure about which data structures I can implement in MySQL, my only idea was the table I mentioned in the original post. You made me realize something else though, I'm going to have to keep track of cycles in the individual components of the forest until they form an MST. — Mike, Nov 23 '12 at 23:07
Would this mean that I need to keep each "sub-tree" in the forest as a separate eset? — Mike, Nov 23 '12 at 23:08
If you're using a relational database to hold the nodes and edges, I presume you must have a large graph. In that case you might be interested in *external memory* algorithms for MST, which are specifically designed to be efficient in the case where the entire graph doesn't fit in random-access memory. You could start by searching within the page "http://en.wikipedia.org/wiki/Minimum_spanning_tree#Algorithms" for "Engineering an External Memory Minimum Spanning Tree Algorithm". — j_random_hacker, Nov 26 '12 at 06:35

score 3 · Accepted Answer · edited Feb 02 '13 at 12:04

As the comments of Billiska indicate, you need to keep track of the individual trees of your forest, i.e. the connected sets.

The easiest implementation of a disjoint set data structure would consist of a single temporary table, which maps the ID of every vertex to the ID of a parent. You can follow these parent links from one vertex to the next until you end up with the root of this tree, which is a vertex pointing at itself. This root serves as a unique representative that identifies the whole connected set.

So to check whether two sets are already connected, you compute their roots and simply compare them.

Initially every vertex is its own parent.
Connecting two nodes is modeled by computing their roots and making one of them the parent of the other.

There are additional tools to keep the depth of the tree low:

you'd always make the deeper tree the parent of the less deep one, and you could perform path compression to reduce the depth of found nodes.

All of this can be modeled in MySQL as well, but the performance behaviour might be different from an in-memory implementation.

So I'd suggest postponing that until you actually know that you need more performance, and have some data to test and compare different implementations.

Depth First Search in MySQL

1 Answers1