5

I have a string that can contain pretty much any character. Inside the string there is the delimiter {{{.

For example: afskjdfakjsdfkjas{{{fasdf.

Using attoparsec, what is the idiomatic way of writing a Parser () that skips all characters before {{{, but without consuming the {{{?

danidiaz
  • 26,936
  • 4
  • 45
  • 95

2 Answers2

3

Use attoparsec's lookAhead (which applies a parser without consuming any input) and manyTill to write a parser that consumes everything up to (but excluding) a {{{ delimiter. You're then free to apply that parser and throw its result away.

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative ( (<|>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
import Data.Attoparsec.Combinator ( lookAhead, manyTill )

myParser :: Parser Text
myParser = T.concat <$> manyTill (nonOpBraceSpan <|> opBraceSpan)
                                 (lookAhead $ string "{{{")
                    <?> "{{{"
  where
    opBraceSpan    = takeWhile1 (== '{')
    nonOpBraceSpan = takeWhile1 (/= '{')

In GHCi:

λ> :set -XOverloadedStrings 
λ> parseTest myParser "{foo{{bar{{{baz"
Done "{{{baz" "{foo{{bar"
jub0bs
  • 60,866
  • 25
  • 183
  • 186
  • What is the rationale behind using `(nonOpBraceSpan <|> opBraceSpan)` instead of `anyChar`? – levant pied Sep 04 '20 at 22:29
  • 1
    @levantpied I wrote this answer a while ago... and I can't remember :) I'd have to retest it to see whether using `anyChar` changes the behaviour. – jub0bs Sep 11 '20 at 06:45
0

You can do it the slightly harder way like this:

foo = many $ do
  Just c <- fmap (const Nothing) (try $ string "{{{") <|> fmap Just anyChar
  return c

Or you could use this helper function manyTill like this:

foo = manyTill anyChar (try $ string "{{{")
Jeremy List
  • 1,756
  • 9
  • 16
  • That doesn't leave the `{{{` unconsumed, though. – danidiaz May 29 '15 at 13:35
  • You're right. And I can't see anything in the attoparsec docs that would help with that directly. But what you can do is write the next part of the parser knowing that it will never be reached unless `{{{` is found. – Jeremy List Jun 01 '15 at 01:25
  • 1
    @JeremyList Note that `try` is just `id` in attoparsec. [*This combinator is provided for compatibility with Parsec. Attoparsec parsers always backtrack on failure.*](https://hackage.haskell.org/package/attoparsec-0.10.2.0/docs/Data-Attoparsec-Text.html#g:7) – jub0bs Jul 02 '15 at 10:05
  • Good to know. When I rolled my own parser combinators I didn't include `try` for the same reason. – Jeremy List Jul 06 '15 at 00:55