4

I have implemented a small cycle detection algorithm for a DAG in Scala. The 'return' bothers me - I'd like to have a version without the return...possible?

  def isCyclic() : Boolean = {
    lock.readLock().lock()
    try {
      nodes.foreach(node => node.marker = 1)
      nodes.foreach(node => {if (1 == node.marker && visit(node)) return true})
    } finally {
      lock.readLock().unlock()
    }
    false
  }

  private def visit(node: MyNode): Boolean = {
    node.marker = 3

    val nodeId = node.id
    val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
    children.foreach(child => {
      if (3 == child.marker || (1 == child.marker && visit(child))) return true
    })

    node.marker = 2

    false
  }
jts
  • 715
  • 7
  • 22

5 Answers5

4

Summary:
I have originated two solutions as generic FP functions which detect cycles within a directed graph. And per your implied preference, the use of an early return to escape the recursive function has been eliminated. The first, isCyclic, simply returns a Boolean as soon as the DFS (Depth First Search) repeats a node visit. The second, filterToJustCycles, returns a copy of the input Map filtered down to just the nodes involved in any/all cycles, and returns an empty Map when no cycles are found.

Details:
For the following, please Consider a directed graph encoded as such:

val directedGraphWithCyclesA: Map[String, Set[String]] =
  Map(
      "A" -> Set("B", "E", "J")
    , "B" -> Set("E", "F")
    , "C" -> Set("I", "G")
    , "D" -> Set("G", "L")
    , "E" -> Set("H")
    , "F" -> Set("G")
    , "G" -> Set("L")
    , "H" -> Set("J", "K")
    , "I" -> Set("K", "L")
    , "J" -> Set("B")
    , "K" -> Set("B")
  )

In both functions below, the type parameter N refers to whatever "Node" type you care to provide. It is important the provided "Node" type be both immutable and have stable equals and hashCode implementations (all of which occur automatically with use of immutable case classes).


The first function, isCyclic, is a similar in nature to the version of the solution provided by @the-archetypal-paul. It assumes the directed graph has been transformed into a Map[N, Set[N]] where N is the identity of a node in the graph.

If you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.

Calling the isCyclic function as such:

val isCyclicResult = isCyclic(directedGraphWithCyclesA)

will return:

`true`

No further information is provided. And the DFS (Depth First Search) is aborted at detection of the first repeated visit to a node.

def isCyclic[N](nsByN: Map[N, Set[N]]) : Boolean = {
  def hasCycle(nAndNs: (N, Set[N]), visited: Set[N] = Set[N]()): Boolean =
    if (visited.contains(nAndNs._1))
      true
    else
      nAndNs._2.exists(
        n =>
          nsByN.get(n) match {
            case Some(ns) =>
              hasCycle((n, ns), visited + nAndNs._1)
            case None =>
              false
          }
      )
  nsByN.exists(hasCycle(_))
}

The second function, filterToJustCycles, uses the set reduction technique to recursively filter away unreferenced root nodes in the Map. If there are no cycles in the supplied graph of nodes, then .isEmpty will be true on the returned Map. If however, there are any cycles, all of the nodes required to participate in any of the cycles are returned with all of the other non-cycle participating nodes filtered away.

Again, if you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.

Calling the filterToJustCycles function as such:

val cycles = filterToJustCycles(directedGraphWithCyclesA)

will return:

Map(E -> Set(H), J -> Set(B), B -> Set(E), H -> Set(J, K), K -> Set(B))

It's trivial to then create a traversal across this Map to produce any or all of the various cycle pathways through the remaining nodes.

def filterToJustCycles[N](nsByN: Map[N, Set[N]]): Map[N, Set[N]] = {
  def recursive(nsByNRemaining: Map[N, Set[N]], referencedRootNs: Set[N] = Set[N]()): Map[N, Set[N]] = {
    val (referencedRootNsNew, nsByNRemainingNew) = {
      val referencedRootNsNewTemp =
        nsByNRemaining.values.flatten.toSet.intersect(nsByNRemaining.keySet)
      (
          referencedRootNsNewTemp
        , nsByNRemaining.collect {
            case (t, ts) if referencedRootNsNewTemp.contains(t) && referencedRootNsNewTemp.intersect(ts.toSet).nonEmpty =>
              (t, referencedRootNsNewTemp.intersect(ts.toSet))
          }
        )
    }
    if (referencedRootNsNew == referencedRootNs)
      nsByNRemainingNew
    else
      recursive(nsByNRemainingNew, referencedRootNsNew)
  }
  recursive(nsByN)
}

So, how does one generically transform a custom directed graph implementation into a Map[N, Set[N]]?

In essence, "Go Scala case classes!"

First, let's define an example case of a real node in a pre-existing directed graph:

class CustomNode (
    val equipmentIdAndType: String //"A387.Structure" - identity is embedded in a string and must be parsed out
  , val childrenNodes: List[CustomNode] //even through Set is implied, for whatever reason this implementation used List
  , val otherImplementationNoise: Option[Any] = None
)

Again, this is just an example. Yours could involve subclassing, delegation, etc. The purpose is to have access to a something that will be able to fetch the two essential things to make this work:

  • the identity of a node; i.e. something to distinguish it and makes it unique from all other nodes in the same directed graph
  • a collection of the identities of the immediate children of a specific node - if the specific node doesn't have any children, this collection will be empty

Next, we define a helper object, DirectedGraph, which will contain the infrastructure for the conversion:

  • Node: an adapter trait which will wrap CustomNode
  • toMap: a function which will take a List[CustomNode] and convert it to a Map[Node, Set[Node]] (which is type equivalent to our target type of Map[N, Set[N]])

Here's the code:

object DirectedGraph {
  trait Node[S, I] {
    def source: S
    def identity: I
    def children: Set[I]
  }

  def toMap[S, I, N <: Node[S, I]](ss: List[S], transformSToN: S => N): Map[N, Set[N]] = {
    val (ns, nByI) = {
      val iAndNs =
        ss.map(
          s => {
            val n =
              transformSToN(s)
            (n.identity, n)
          }
        )
      (iAndNs.map(_._2), iAndNs.toMap)
    }
    ns.map(n => (n, n.children.map(nByI(_)))).toMap
  }
}

Now, we must generate the actual adapter, CustomNodeAdapter, which will wrap each CustomNode instance. This adapter uses a case class in a very specific way; i.e. specifying two constructor parameters lists. It ensures the case class conforms to a Set's requirement that a Set member have correct equals and hashCode implementations. For more details on why and how to use a case class this way, please see this StackOverflow question and answer:

object CustomNodeAdapter extends (CustomNode => CustomNodeAdapter) {
  def apply(customNode: CustomNode): CustomNodeAdapter =
    new CustomNodeAdapter(fetchIdentity(customNode))(customNode) {}

  def fetchIdentity(customNode: CustomNode): String =
    fetchIdentity(customNode.equipmentIdAndType)

  def fetchIdentity(eiat: String): String =
    eiat.takeWhile(char => char.isLetter || char.isDigit)
}
abstract case class CustomNodeAdapter(identity: String)(customNode: CustomNode) extends DirectedGraph.Node[CustomNode, String] {
  val children =
    customNode.childrenNodes.map(CustomNodeAdapter.fetchIdentity).toSet
  val source =
    customNode
}

We now have the infrastructure in place. Let's define a "real world" directed graph consisting of CustomNode:

val customNodeDirectedGraphWithCyclesA =
  List(
      new CustomNode("A.x", List("B.a", "E.a", "J.a"))
    , new CustomNode("B.x", List("E.b", "F.b"))
    , new CustomNode("C.x", List("I.c", "G.c"))
    , new CustomNode("D.x", List("G.d", "L.d"))
    , new CustomNode("E.x", List("H.e"))
    , new CustomNode("F.x", List("G.f"))
    , new CustomNode("G.x", List("L.g"))
    , new CustomNode("H.x", List("J.h", "K.h"))
    , new CustomNode("I.x", List("K.i", "L.i"))
    , new CustomNode("J.x", List("B.j"))
    , new CustomNode("K.x", List("B.k"))
    , new CustomNode("L.x", Nil)
  )

Finally, we can now do the conversion which looks like this:

val transformCustomNodeDirectedGraphWithCyclesA =
  DirectedGraph.toMap[CustomNode, String, CustomNodeAdapter](customNodes1, customNode => CustomNodeAdapter(customNode))

And we can take transformCustomNodeDirectedGraphWithCyclesA, which is of type Map[CustomNodeAdapter,Set[CustomNodeAdapter]], and submit it to the two original functions.

Calling the isCyclic function as such:

val isCyclicResult = isCyclic(transformCustomNodeDirectedGraphWithCyclesA)

will return:

`true`

Calling the filterToJustCycles function as such:

val cycles = filterToJustCycles(transformCustomNodeDirectedGraphWithCyclesA)

will return:

Map(
    CustomNodeAdapter(B) -> Set(CustomNodeAdapter(E))
  , CustomNodeAdapter(E) -> Set(CustomNodeAdapter(H))
  , CustomNodeAdapter(H) -> Set(CustomNodeAdapter(J), CustomNodeAdapter(K))
  , CustomNodeAdapter(J) -> Set(CustomNodeAdapter(B))
  , CustomNodeAdapter(K) -> Set(CustomNodeAdapter(B))
)

And if needed, this Map can then be converted back to Map[CustomNode, List[CustomNode]]:

cycles.map {
  case (customNodeAdapter, customNodeAdapterChildren) =>
    (customNodeAdapter.source, customNodeAdapterChildren.toList.map(_.source))
}

If you have any questions, issues or concerns, please let me know and I will address them ASAP.

Community
  • 1
  • 1
chaotic3quilibrium
  • 5,661
  • 8
  • 53
  • 86
4

Yes, by using '.find' instead of 'foreach' + 'return':

http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Seq

def isCyclic() : Boolean = {
  def visit(node: MyNode): Boolean = {
      node.marker = 3

      val nodeId = node.id
      val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
      val found = children.exists(child => (3 == child.marker || (1 == child.marker && visit(child))))

      node.marker = 2

      found
  }

  lock.readLock().lock()
  try {
    nodes.foreach(node => node.marker = 1)
    nodes.exists(node => node.marker && visit(node))
  } finally {
    lock.readLock().unlock()
  }
}
Alois Cochard
  • 9,812
  • 2
  • 29
  • 30
2

I think the problem can be solved without changing the state of the node with the marker field. The following is a rough code of what i think the isCyclic should look like. I am currently storing the node objects which are visited instead you can store the node ids if the node doesnt have equality based on node id.

def isCyclic() : Boolean = nodes.exists(hasCycle(_, HashSet()))

def hasCycle(node:Node, visited:Seq[Node]) = visited.contains(node) || children(node).exists(hasCycle(_,  node +: visited))


def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
Udayakumar Rayala
  • 2,264
  • 1
  • 20
  • 17
  • I think you may want something with faster .contains than a Seq? Or hasCycle is O(number-of-nodes^2) when it could be O(number-of-nodes) – The Archetypal Paul Sep 13 '11 at 12:33
  • 1
    Also, this pattern of creating a new (in this case) visited for each recursive call always feels to me like being functional just for the sake of it and generates reams of garbage. Having a single mutable 'visited' (local to isCyclic, and making hasCycle local to isCyclic too) seems more efficient, and cleaner if not pure fp... – The Archetypal Paul Sep 13 '11 at 12:36
  • This is a nice solution. I need to give this a try. – jts Sep 13 '11 at 13:00
  • @Paul I like the immutable data structures because it simplifies the design and i can concentrate on the function i am working on. This might not be very performant as having a mutable visited nodes. But when compared to the readability, the performance hit will be negligible. – Udayakumar Rayala Sep 13 '11 at 13:18
  • @Paul I agree with your comment that the hasCycle is not O(n^2). This can be made faster by changing the visited nodes as a HashSet instead of Seq. Thanks for that. – Udayakumar Rayala Sep 13 '11 at 13:20
  • @Udayakumar Rayala, readability isn't very different in these two cases (I claim - see my answer). I would argue that the mutable one might even be more readable as it's more obvious (to me, anyway) what the 'visited' variable is for. – The Archetypal Paul Sep 13 '11 at 13:59
  • @Paul I think both versions are equally readable, just that the mutable version is more familiar to us than the immutable one. Regarding your comment about creating new Seqs generating garbage, I think you are not aware that all scala immutable collectins are Persistent data structures(http://en.wikipedia.org/wiki/Persistent_data_structure). Hence, the performance of both the code snippets are same and this code is functional as well. – Udayakumar Rayala Sep 13 '11 at 21:33
  • @Udayakumar, that's exactly my point! From the page "persistent data structure is a data structure which always preserves the previous version of itself when it is modified;" - your version makes many copies of the visited collection, one for every recursive call. The previous version is no longer needed (so it's garbage). Mine has one mutable instance == less garbage. Whether the performance difference is noticeable in practice is an entirely different thing, of course... – The Archetypal Paul Sep 14 '11 at 08:11
  • @Paul Why do you worry about the garbage when the performance is not noticeable? – Udayakumar Rayala Sep 14 '11 at 14:44
  • I didn't say it wasn't noticeable, I said that was a different question. As a matter of good practice, however, I think one should be aware of how much garbage/memory one's code uses and - other things being equal - try not to use more than necessary – The Archetypal Paul Sep 14 '11 at 16:19
1

Answer added just to show that the mutable-visited isn't too unreadable either (untested, though!)

def isCyclic() : Boolean =
{
     var visited = HashSet()

     def hasCycle(node:Node) = {
        if (visited.contains(node)) {
           true
        } else {
           visited :+= node
           children(node).exists(hasCycle(_))
        }
    }
    nodes.exists(hasCycle(_))
}

def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
The Archetypal Paul
  • 41,321
  • 20
  • 104
  • 134
  • Would you agree that the first version with vals and no new would be the ideal FP implementation? – jts Sep 13 '11 at 14:06
  • @JTS, I guess I would wonder why you wanted an "ideal" FP implementation. For me, readability, performance and robustness are important. FP can be a good way of achieving readability and robustnes (and some times performance) but for me its a means to an end, and not the end itself. I'm not normally concerned whether it's "ideal FP". And there is as much a `new` in that first version as there is in mine - both call `Hashset()` which returns a new `HashSet` – The Archetypal Paul Sep 13 '11 at 19:48
0

If p = node => node.marker==1 && visit(node) and assuming nodes is a List you can pick any of the following:

  • nodes.filter(p).length>0
  • nodes.count(p)>0
  • nodes.exists(p) (I think the most relevant)

I am not sure of the relative complexity of each method and would appreciate a comment from fellow members of the community

Jamil
  • 2,150
  • 1
  • 19
  • 20