4

I am attempting to create a Scala method that will take one parent group of parentheses, represented as a String, and then map each subgroup of parentheses to a different letter. It should then put these in a map which it returns, so basically I call the following method like this:

val s = "((2((x+3)+6)))"
val map = mapParentheses(s)

Where s could contain any number of sets of parentheses, and the Map returned should contain:

"(x+3)" -> 'a'

"(a+6)" -> 'b'

"(2b)" -> 'c'

"(c)" -> 'd'

So that elsewhere in my program I can recall 'd' and get "(c)" which will become "((2b))" then ((2(a+6))) and finally ((2((x+3)+6))). The string sent to the method mapParentheses will never have unmatched parentheses, or extra chars outside of the main parent parentheses, so the following items will never be sent:

  • "(fsf)a" because the a is outside the parent parentheses
  • "(a(aa))(a)" because the (a) is outside the parent parentheses
  • "((a)" because the parentheses are unmatched
  • ")a(" because the parentheses are unmatched

So I was wondering if anyone knew of an easy (or not easy) way of creating this mapParentheses method.

Seren
  • 378
  • 2
  • 7
  • Can you have multiple parentheticals at the same level (e.g., `((x + 1) + (y + 2))`)? – Travis Brown Sep 15 '12 at 22:18
  • @TravisBrown - Doesn't matter. The strings are guaranteed to be correct, and the general solution is as easy as the one that assumes there won't be multiple blocks at the same level. – Rex Kerr Sep 15 '12 at 22:22
  • @RexKerr: You can write a slightly nicer parser combinator version if you know you've only got one per level. – Travis Brown Sep 15 '12 at 23:36
  • @TravisBrown - Okay, you've demonstrated as much, I agree. – Rex Kerr Sep 15 '12 at 23:45
  • @TravisBrown - Yes, any combinations of parentheses are possible, but the overall string will be encased in a "parent" set of parentheses – Seren Sep 16 '12 at 05:09

3 Answers3

3

You can do this pretty easily with Scala's parser combinators. First for the import and some simple data structures:

import scala.collection.mutable.Queue
import scala.util.parsing.combinator._

sealed trait Block {
  def text: String
}

case class Stuff(text: String) extends Block

case class Paren(m: List[(String, Char)]) extends Block {
  val text = m.head._2.toString
  def toMap = m.map { case (k, v) => "(" + k + ")" -> v }.toMap
}

I.e., a block represents a substring of the input that is either some non-parenthetical stuff or a parenthetical.

Now for the parser itself:

class ParenParser(fresh: Queue[Char]) extends RegexParsers {
  val stuff: Parser[Stuff] = "[^\\(\\)]+".r ^^ (Stuff(_))

  def paren: Parser[Paren] = ("(" ~> insides <~ ")") ^^ {
    case (s, m) => Paren((s -> fresh.dequeue) :: m)
  }

  def insides: Parser[(String, List[(String, Char)])] =
    rep1(paren | stuff) ^^ { blocks =>
      val s = blocks.flatMap(_.text)(collection.breakOut)
      val m = blocks.collect {
        case Paren(n) => n
      }.foldLeft(List.empty[(String, Char)])(_ ++ _)
      (s, m)
    }

  def parse(input: String) = this.parseAll(paren, input).get.toMap
}

Using get in the last line is very much not ideal, but is justified by your assertion that we can expect well-formed input.

Now we can create a new parser and pass in a mutable queue with some fresh variables:

val parser = new ParenParser(Queue('a', 'b', 'c', 'd', 'e', 'f'))

And now try out your test string:

scala> println(parser parse "((2((x+3)+6)))")
Map((c) -> d, (2b) -> c, (a+6) -> b, (x+3) -> a)

As desired. A more interesting exercise (left to the reader) would be to thread some state through the parser to avoid the mutable queue.

Travis Brown
  • 138,631
  • 12
  • 375
  • 680
  • Wouldn't it be better to take an `Iterator[Char]` (or a `=> Char`) so you could generate an infinite source if you wanted? – Rex Kerr Sep 16 '12 at 00:14
2

Classic recursive parsing problem. It can be handy to hold the different bits. We'll add a few utility methods to help us out later.

trait Part {
  def text: String
  override def toString = text
}
class Text(val text: String) extends Part {}
class Parens(val contents: Seq[Part]) extends Part {
  val text = "(" + contents.mkString + ")"
  def mapText(m: Map[Parens, Char]) = {
    val inside = contents.collect{
      case p: Parens => m(p).toString
      case x => x.toString
    }
    "(" + inside.mkString + ")"
  }
  override def equals(a: Any) = a match {
    case p: Parens => text == p.text
    case _ => false
  }
  override def hashCode = text.hashCode
}

Now you need to parse into these things:

def str2parens(s: String): (Parens, String) = {
  def fail = throw new Exception("Wait, you told me the input would be perfect.")
  if (s(0) != '(') fail
  def parts(s: String, found: Seq[Part] = Vector.empty): (Seq[Part], String) = {
    if (s(0)==')') (found,s)
    else if (s(0)=='(') {
      val (p,s2) = str2parens(s)
      parts(s2, found :+ p)
    }
    else {
      val (tx,s2) = s.span(c => c != '(' && c != ')')
      parts(s2, found :+ new Text(tx))
    }
  }
  val (inside, more) = parts(s.tail)
  if (more(0)!=')') fail
  (new Parens(inside), more.tail)
}

Now we've got the whole thing parsed. So let's find all the bits.

def findParens(p: Parens): Set[Parens] = {
  val inside = p.contents.collect{ case q: Parens => findParens(q) }
  inside.foldLeft(Set(p)){_ | _}
}

Now we can build the map you want.

def mapParentheses(s: String) = {
  val (p,_) = str2parens(s)
  val pmap = findParens(p).toSeq.sortBy(_.text.length).zipWithIndex.toMap
  val p2c = pmap.mapValues(i => ('a'+i).toChar)
  p2c.map{ case(p,c) => (p.mapText(p2c), c) }.toMap
}

Evidence that it works:

scala> val s = "((2((x+3)+6)))"
s: java.lang.String = ((2((x+3)+6)))

scala> val map = mapParentheses(s)
map: scala.collection.immutable.Map[java.lang.String,Char] =
  Map((x+3) -> a, (a+6) -> b, (2b) -> c, (c) -> d)

I will leave it as an exercise to the reader to figure out how it works, with the hint that recursion is a really powerful way to parse recursive structures.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • Why suffer through all this `if (x(0)=='(')`-ing when you've got parser combinators in the standard library? – Travis Brown Sep 15 '12 at 23:44
  • @TravisBrown - Because the cognitive burden of adding a parsing library when you have a nearly trivial task is not worth it if you want to understand how the solution actually works (rather than _that_ it works). – Rex Kerr Sep 15 '12 at 23:49
  • Fair enough, although I think the parser combinator approach actually makes the structure of the problem a lot clearer, even in a fairly simple case like this. – Travis Brown Sep 15 '12 at 23:54
  • @TravisBrown - It certainly makes some parts clearer, but there's a fair bit of "magic" to understand. Once you understand both the surface level and inner magic, I agree it's a better way to express the logical structure of the problem. – Rex Kerr Sep 16 '12 at 00:12
0
def parse(s: String, 
  c: Char = 'a', out: Map[Char, String] = Map() ): Option[Map[Char, String]] =
  """\([^\(\)]*\)""".r.findFirstIn(s) match {
    case Some(m) => parse(s.replace(m, c.toString), (c + 1).toChar , out + (c -> m))
    case None if s.length == 1 => Some(out)
    case _ => None
  }

This outputs an Option containing a Map if it parses, which is better than throwing an exception if it doesn't. I suspect you really wanted a map from Char to the String, so that's what this outputs. c and out are default parameters so you don't need to input them yourself. The regex just means "any number of characters that aren't parens, eclosed in parens" (the paren characters need to be escaped with "\"). findFirstIn finds the first match and returns an Option[String], which we can pattern match on, replacing that string with the relevant character.

val s = "((2((x+3)+6)))"
parse(s)  //Some(Map(a -> (x+3), b -> (a+6), c -> (2b), d -> (c)))
parse("(a(aa))(a)") //None
Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180