3

I am trying to implement a regex to NFA converter. I have most of the code written, but I am struggling to find a way to build a graph with a cycle given my representation for states (nodes) and edges.

My graph representation is as follows:

type state =
| State of int * edge list (* Node ID and outgoing edges *)
| Match (* Match state for the NFA: no outgoing edges *)
and edge =
| Edge of state * string (* End state and label *)
| Epsilon of state (* End state *)

My function to convert a regex to an NFA is basically a pattern match on the type of regex, taking in the regex type and the "final state" (where all the outgoing edges for the NFA will go) and returning the "start state" of the (partially built) NFA for that regex. The NFA fragments are built by returning a State constructed with its outgoing edge list, where each edge's end state is constructed via a recursive call.

Most of the code is easy, but I am having trouble building the NFA for Kleene star and +, which require cycles in the graph. Given my representation I end up with something like:

let rec regex2nfa regex final_state =
  match regex with
  ... (* Other cases... *)
  | KleeneStar(re) ->
      let s = State(count, [Epsilon(regex2nfa r s); Epsilon(final_state)]) in
      s

Obviously this doesn't compile as s is undefined at this point. However I also cannot add the "rec" keyword because the type checker will (rightfully) reject such a recursively defined type, and I can't get around this by using Lazy because forcing the evaluation of "s" will recursively force it (again and again...). Basically I have a chicken and egg problem here - I need to pass the "state" reference before it is fully constructed to a another state that will have an edge back to it, but of course the original state must be fully constructed to be passed in the recursive call.

Is there anyway way to do this without using references/mutable records? I would really like to keep this as functional as possible but I don't see a way around this given the situation... Anyone have suggestions?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
  • For what it's worth, I've found it infeasible to deal with immutable cyclic structures in OCaml (due to its eagerness). I've done graph processing where each node contained the node ids of its successors, rather than the nodes themselves. Then I had a separate map of node ids to nodes. There is extra overhead of map lookup, but it worked for me. – Jeffrey Scofield May 07 '14 at 06:10
  • 2
    Compiler wont reject your recursive value definition if it [statically constructive](http://caml.inria.fr/pub/docs/manual-ocaml-400/manual021.html). To be concrete, lets try with lazy values. In order to refer to a `s` variable you need put it inside a lazy form. That means that your Epsilon constructor, should accept `state Lazy.t`. As well as other constructors. – ivg May 07 '14 at 06:50
  • How about the Inductive Graphs as defined by Erwig? – nlucaroni May 07 '14 at 13:06

1 Answers1

1

You can create data structures with cycles without explicit references, using lazy types or functions. Indeed, both of them hides some form of mutability.

Here is an example of a simplest lazy structure, that is more complex than a list

type 'a tree = 'a tr Lazy.t
and  'a tr = Stem of 'a * 'a tree * 'a tree

let rec tree_with_loop : int tree =
  lazy (Stem (42,tree_with_loop,tree_with_loop))

But, you should understand, that with this kind of structures (i.e., those that contains cycles) you're stepping to an infirm ground of infinity, as all your traversing functions now diverge.

And here is the same example, but without lazy:

type 'a tree = unit -> 'a tr
and  'a tr = Stem of 'a * 'a tree * 'a tree

let rec tree_with_loop : int tree =
  fun () -> Stem (42,tree_with_loop,tree_with_loop)

And here is an example of a slightly less infinite tree:

type 'a tree = 'a tr Lazy.t
and  'a tr =
  | Node of 'a
  | Tree of 'a tree * 'a tree

let rec tree_with_loop : int tree =
  lazy (Tree (tree_with_loop,
              lazy (Node 42)))
ivg
  • 34,431
  • 2
  • 35
  • 63
  • This is ABSOLUTELY what I was looking for, thank you very much! I had tried using Lazy before but didn't really understand how it worked; this explanation (and your other comment) made it more clear. It worked really well for Kleene star, although for + I had to end up using a reference because you have to return the fully evaluated expression... but I was still able to get both to work successfully, and my program is still MOSTLY functional :) – Alex Landau May 07 '14 at 07:32