4

Recently I have been trying to create in Haskell a regex interpretor. What I did was create a new data type with all possible constructors (for sequence, *, ^, intervals, etc) and then define a matcher function. It works wonders but my problem is that I have to convert the input (the String, for example "a(b*)(c|d)ef") to my data type ("Seq (Sym a) (Seq (Rep Sym b) (Seq (Or Sym c Sym d) Sym ef))"). I am having trouble with this part of the problem (I tried creating a new data type, a parsing tree, but I failed completely). Any ideas on how I could solve it?

rid
  • 61,078
  • 31
  • 152
  • 193
Iulia Muntianu
  • 145
  • 1
  • 14
  • In case you're not building this just for fun, there is also Text.Regex – Jani Hartikainen Apr 18 '12 at 12:52
  • 1
    http://www.haskell.org/haskellwiki/Parsec: I don't know its details, but it's a really good library for parsing... playing with it also teaches you many things about monads. – Riccardo T. Apr 18 '12 at 12:53

2 Answers2

8

The canonical approach is to use a parser combinator library, such as Parsec. Parser combinator libraries (like parser generators) let you write descriptions of your grammar, yielding a parser from strings to tokens in that language.

You simply have to encode your grammar as a Parsec function.

As an example, see this previous SO question: Using Parsec to parse regular expressions

Community
  • 1
  • 1
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
4

That's an interesting article (a play) on the implementation of regular expressions:

A Play on Regular Expressions

JJJ
  • 2,731
  • 1
  • 12
  • 23