4

My goal was to write a function to parse string of nested parentheses into a corresponding list:

parseParens "()" --> []
parseParens "(())" --> [[]]
parseParens "((()()))" --> [[[],[]]]

First off I discovered that I can't specify easily define a type of the return value. I could do something like:

parseParens :: String -> [[[[t]]]]

But how do I say that it's infinitely nested? I guess Haskell doesn't allow that.

My solution

I came up with my own data type:

data InfiniteList = EmptyList | Cons InfiniteList InfiniteList deriving (Show)

And a parser function that uses this:

parseParens :: String -> InfiniteList
parseParens ('(':xs) =
    if remainder == ""
        then result
        else error "Unbalanced parenthesis"
    where (result, remainder) = parseToClose EmptyList xs
parseParens _ = error "Unbalanced parenthesis"

parseToClose :: InfiniteList -> String -> (InfiniteList, String)
parseToClose acc "" = error "Unbalanced parenthesis!"
parseToClose acc (')':xs) = (acc, xs)
parseToClose acc ('(':xs) = parseToClose (concatInfLists acc (Cons result EmptyList)) remainder
    where (result, remainder) = parseToClose EmptyList xs

concatInfLists :: InfiniteList -> InfiniteList -> InfiniteList
concatInfLists EmptyList ys = ys
concatInfLists (Cons x xs) ys = Cons x (concatInfLists xs ys)

Working like so:

parseParens "()" --> EmptyList
parseParens "(())" --> Cons EmptyList EmptyList
parseParens "((()()))" --> Cons (Cons EmptyList (Cons EmptyList EmptyList)) EmptyList

How to improve?

There surely must be a better way to do this. Perhaps there's even a way to use the built-in List data type for this?

Rene Saarsoo
  • 13,580
  • 8
  • 57
  • 85
  • 5
    `data Nested a = Flat a | Nested (Nested [a])`. A `Nested a` value consists of _n_ `Nested` constructors, terminated by a `Flat` constructor containing an _n_-deep nested list of `a`s. – Benjamin Hodgson Nov 30 '17 at 15:39

1 Answers1

8

Edit: Fixed my mischaracterization of Benjamin's answer.

While the answer in @Benjamin Hodgson's comment:

data Nested a = Flat a | Nested (Nested [a]) deriving (Show)

gives a good way to represent a homogeneous list of arbitrary nesting depth (i.e., sort of like a sum type of [a] plus [[a]] plus [[[a]]] plus all the rest), it seems like an unusual representation for your problem, particularly in a case like:

parseParens "(()(()))"

where the nesting depth of the "child nodes" differs. This would be represented as:

Nested (Nested (Nested (Flat [[],[[]]]))) :: Nested a

so it literally allows you to represent the result of the parse as the desired list, given enough Nested constructors, but it has some odd properties. For example, the innermost empty lists actually have different types: the first is of type [[a]] while the second is of type [a].

As a alternative approach, I think the data type you actually want is probably just:

data Nested = N [Nested] deriving (Show)

where each node N is a (possibly empty) list of nodes. Then, you'll get:

> parseParens "()"
N []
> parseParens "(())"
N [N []]
> parseParens "((()()))"
N [N [N [],N []]]
> parseParens "(()(()))"
N [N [],N [N []]]

If you just ignore the N constructors in these results, the first three of these match your "corresponding list" test cases from the beginning of your question.

As a side note: the Nested data type above is actually a "rose tree" containing no data, equivalent to Tree () using the Tree data type from Data.Tree in the containers package.

Finally, I can't emphasize enough how helpful it is to learn and use a monadic parsing library, even for simple parsing jobs. Using the parsec library, for example, you can write a parser for your grammar in one line:

nested = N <$> between (char '(') (char ')') (many nested)

My full code for parseParens is:

import Data.Tree
import Text.Parsec
import Text.Parsec.String

data Nested = N [Nested] deriving (Show)

nested :: Parser Nested
nested = N <$> between (char '(') (char ')') (many nested)

parseParens :: String -> Nested
parseParens str =
  let Right result = parse (nested <* eof) "" str
  in  result
K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71
  • Thanks. This `N []` thingie works well. I managed to easily plug it into my existing code. I'll definitely need to learn about these parsing libraries as well... though first I'll need to wrap my head around monads. – Rene Saarsoo Nov 30 '17 at 20:13
  • Is there any particular reason for creating new datatype. It seems to me parsing to list is enough. – user26732 Dec 01 '17 at 01:45
  • @user26732, the original issue was deciding what type to give `parseParens`. For example, If it had type `parseParens :: String -> [[[[a]]]]`, you could return plain lists, but only up to depth 4. – K. A. Buhr Dec 01 '17 at 14:37