17

I'm writing my first program with Parsec. I want to parse MySQL schema dumps and would like to come up with a nice way to parse strings representing certain keywords in case-insensitive fashion. Here is some code showing the approach I'm using to parse "CREATE" or "create". Is there a better way to do this? An answer that doesn't resort to buildExpressionParser would be best. I'm taking baby steps here.

  p_create_t :: GenParser Char st Statement
  p_create_t = do
      x <- (string "CREATE" <|> string "create")
      xs <- manyTill anyChar (char ';')
      return $ CreateTable (x ++ xs) []  -- refine later
dan
  • 43,914
  • 47
  • 153
  • 254
  • 5
    I'm assuming that `map toLower` on the input before even running the parser isn't an option? Also, I'd expect "case insensitive" to also match "Create", "CrEaTe", "CREATe", or any other variation, which your example rejects. Which do you want? – C. A. McCann Oct 17 '12 at 15:10
  • That does work. Thanks. I hadn't thought of that! – dan Oct 17 '12 at 15:12
  • 1
    @dan Just beware that if your input contains strings, they'll be lowercased too. For example, if any of your columns contain default string values. – Petr Oct 17 '12 at 17:22

5 Answers5

21

You can build the case-insensitive parser out of character parsers.

-- Match the lowercase or uppercase form of 'c'
caseInsensitiveChar c = char (toLower c) <|> char (toUpper c)

-- Match the string 's', accepting either lowercase or uppercase form of each character 
caseInsensitiveString s = try (mapM caseInsensitiveChar s) <?> "\"" ++ s ++ "\""
Heatsink
  • 7,721
  • 1
  • 25
  • 36
9

Repeating what I said in a comment, as it was apparently helpful:

The simple sledgehammer solution here is to simply map toLower over the entire input before running the parser, then do all your keyword matching in lowercase.

This presents obvious difficulties if you're parsing something that needs to be case-insensitive in some places and case-sensitive in others, or if you care about preserving case for cosmetic reasons. For example, although HTML tags are case-insensitive, converting an entire webpage to lowercase while parsing it would probably be undesirable. Even when compiling a case-insensitive programming language, converting identifiers could be annoying, as any resulting error messages would not match what the programmer wrote.

hammar
  • 138,522
  • 17
  • 304
  • 385
C. A. McCann
  • 76,893
  • 19
  • 209
  • 302
4

No, Parsec cannot do that in clean way. string is implemented on top of primitive tokens combinator that is hard-coded to use equality test (==). It's a bit simpler to parse case-insensitive character, but you probably want more.

There is however a modern fork of Parsec, called Megaparsec which has built-in solutions for everything you may want:

λ> parseTest (char' 'a') "b"
parse error at line 1, column 1:
unexpected 'b'
expecting 'A' or 'a'
λ> parseTest (string' "foo") "Foo"
"Foo"
λ> parseTest (string' "foo") "FOO"
"FOO"
λ> parseTest (string' "foo") "fo!"
parse error at line 1, column 1:
unexpected "fo!"
expecting "foo"

Note the last error message, it's better than what you can get parsing characters one by one (especially useful in your particular case). string' is implemented just like Parsec's string but uses case-insensitive comparison to compare characters. There are also oneOf' and noneOf' that may be helpful in some cases.


Disclosure: I'm one of the authors of Megaparsec.

Mark Karpov
  • 7,499
  • 2
  • 27
  • 62
  • 1
    Indeed surprising that `tokens` does not allow a compare function to be passed to it to perform the comparison in the original Parsec. – MicroVirus Feb 29 '16 at 16:14
0

Instead of mapping the entire input with toLower, consider using caseString from Text.ParserCombinators.Parsec.Rfc2234 (from the hsemail package)

Text.ParsecCombinators.Parsec.Rfc2234

p_create_t :: GenParser Char st Statement
p_create_t = do
  x <- (caseString "create")
  xs <- manyTill anyChar (char ';')
  return $ CreateTable (x ++ xs) []  -- refine later

So now x will be whatever case-variant is present in the input without changing your input.

ps: I know that this is an ancient question, I just thought that I would add this as this question came up while I was searching for a similar problem

gremble
  • 11
  • 5
  • This is not from Text.ParsecCombinators.Parsec. It is from Text.ParsecCombinators.Parsec.Rfc2234. Your link is correct, but your title is wrong. Also of note is that that is part of the hsemail package, which someone might not already have installed. – Sean Jul 02 '15 at 20:19
0

There is a package name parsec-extra for this purpuse. You need install this package then use 'caseInsensitiveString' parser.

 :m Text.Parsec
 :m +Text.Parsec.Extra

*> parseTest   (caseInsensitiveString  "values")   "vaLUES"
"values"

*> parseTest   (caseInsensitiveString  "values")   "VAlues"
"values"

Link to package is here: https://hackage.haskell.org/package/parsec-extra

suhao399
  • 628
  • 7
  • 11