7

I am converting a context-free grammar into Greibach Normal Form (GNF). The main transformation (from Hopcroft & Ullman) is a sequence of iterations over the indexed variables of the grammar. It is essentially "stateless". I have implemented it as a sequence of folds over the appropriate indices (the implementation is fairly straightforward):

gnf :: Ord a => Set (Rule a) -> Set (Rule a)
gnf rl = foldl step1 rl [1..maxIndex rl]
 where step1 rl' k = foldl step2 rl' [1..k - 1]
        where step2 rl'' j = noLR k (subst rl'' k j)

maxIndex rl returns the maximum variable index in a set of rules; subst rl k j performs substitution on the k-indexed rules by the rules whose right hand side starts with a j-indexed variable. After performing gnf, I need to perform a final pass over the grammar in reverse order.

The problem is noLR, which transforms a grammar with left-recursive k-indexed rules. This is a "stateful" function, since a unique variable must be generated for each rule (or k-indexed rule) to which noLR is applied. So I wrote a stateful function

noLR :: Ord a => Int -> Set (Rule a) -> State [Sym a] (Set (Rule a))
noLR rl = do (n:ns) <- get; put ns;
             let rl' = ... remove left recursion rl n ...
              in return rl'

I can sequence together the noLR in order to update the n which noLR takes as a parameter. I'm not sure how to perform noLR inside step2 in the above function, though. I don't seem to be able to use the let ... in schema, because the stateful computation is embedded inside several recursive functions.

What I want to do is have n be some type of global variable, similar to an explicit threading of n, which I can call and update inside step2, which is why I originally wrote the function as a fold with eta-expansion (for n). Does anyone know how I could structure gnf inside the state monad to achieve this kind of effect? Except for the last computation in the fold, nothing else is "stateful," and I'm only comfortable using the state monad with "trivial" examples. I'm rather lost.

emi
  • 5,380
  • 1
  • 27
  • 45
  • 1
    see `foldM`. `gnf` of course will have begin with a top level evalState call. – sclv Nov 23 '10 at 19:18
  • Also look at Luke Palmer's [IO-free splittable supply](https://lukepalmer.wordpress.com/2009/09/14/io-free-splittable-supply). – atravers Jan 12 '22 at 01:19

2 Answers2

4

In order to use noLR with the type you have given, you will have to rewrite your gnf function along the following lines:

gnf :: Ord a => Set (Rule a) -> Set (Rule a)
gnf rl = evalState (foldM step1 rl [1..maxIndex rl]) ( ... generate the initial state [Sym a] here ...)
 where step1 rl' k = foldM step2 rl' [1..k - 1]
        where step2 rl'' j = noLR k (subst rl'' k j)

Your state variable exists during the whole computation, and that fact has to be made explicit in the code.

If all you need is that the newly-generated variable names don't collide with each other, then you can make noLR pure by generating a new symbol name from the indices k and j - something like "foo_42_16" for k == 42 and j == 16. If the input grammar already contains symbol names of that kind, you might be in trouble, however.

If you need your symbols to be unique within the grammar, then why not say just that?

newSymbol :: Set (Rule a) -> Sym a
newSymbol rl = ... find a symbol name not used in rl ...

This is definitely not efficient, though, unless you replace Set (Rule a) by a different type that allows you to implement the newSymbol operation more efficiently.

wolfgang
  • 4,883
  • 22
  • 27
  • The second suggestion is quite nice! Pair the set with an int representing the max symbol used within it. That's in a sense just making the monad explicit, but in this case it feels much more elegant. – sclv Nov 23 '10 at 23:15
  • Re-evaluating the next variable is, in this case, probably the best solution. It makes the code cleaner overall. But the example with `foldM` cleared up its use for me: it pipes the `s` and `a` values through a left fold, accumulating on `a`. When a "stateful" computation is encountered, it is performed and the `s` is updated. Pretty cool, thanks! – emi Nov 24 '10 at 21:52
3

I would try to rewrite noLR to be pure. Are you sure you cannot rewrite it to generate a symbol which depends only on the name of the rule and its index (or something similar)?

noLR k j = noLR' k j $ newSymbol k j
    where newSymbol k j = ... -- some concatenation of k and j
          noLR' k j sym = ... -- your now pure function
luispedro
  • 6,934
  • 4
  • 35
  • 45
  • Because the folds are accumulating sets of rules, I could rewrite noLR to re-evaluate the next possible variable from the current set of rules. Because of the way this is set up, finding the next variable is a fairly simple fold operation over the rule set. As a learning exercise, though, I come across this problem a lot (the only stateful computation I need is embedded "deep inside a recursion"). I'd like to know how someone would go about expressing it monadically. – emi Nov 23 '10 at 22:25