4

I try to do this:

Parse a Text in the form:

Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}

into a list of some data structure:

[Inside "Some Text ",Outside (0,0,0),Inside " some Text ",Outside (0,0,0),Outside (0,0,0),Inside " more Text ",Outside (0,0,0)]

So these #{a,b,c}-bits should turn into different things as the rest of the text.

I have this code:

module ParsecTest where

import Text.ParserCombinators.Parsec
import Monad

type Reference = (Int, Int, Int)

data Transc = Inside String | Outside Reference
              deriving (Show)

text :: Parser Transc
text =  do
         x <- manyTill anyChar ((lookAhead reference) <|> (eof >> return (Inside "")));
         return (Inside x)

transc = reference <|> text

alot :: Parser [Transc]
alot = do
        manyTill transc eof

reference :: Parser Transc
reference = try (do{ char '#';
                  char '{';
                  a <- number;
                char ',';
                b <- number;
                char ',';
                c <- number;
                char '}';
                return (Outside (a,b,c)) })

number :: Parser Int
number = do{ x <- many1 digit;
             return (read x) }

This works as expected. You can test this in ghci by typing

parseTest alot "Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}"

But I think it's not nice.

1) Is the use of lookAhead really necessary for my problem?

2) Is the return (Inside "") an ugly hack?

3) Is there generally a more concise/smarter way to archieve the same?

Logan Capaldo
  • 39,555
  • 5
  • 63
  • 78
Secoe
  • 634
  • 4
  • 12

1 Answers1

5

1) I think you do need lookAhead as you need the result of that parse. It would be nice to avoid running that parser twice by having a Parser (Transc,Maybe Transc) to indicate an Inside with an optional following Outside. If performance is an issue, then this is worth doing.

2) Yes.

3) Applicatives

number2 :: Parser Int
number2 = read <$> many1 digit

text2 :: Parser Transc
text2 = (Inside .) . (:) 
     <$> anyChar 
     <*> manyTill anyChar (try (lookAhead reference2) *> pure () <|> eof)


reference2 :: Parser Transc
reference2 = ((Outside .) .) . (,,) 
          <$> (string "#{" *> number2 <* char ',') 
          <*> number2 
          <*> (char ',' *> number2 <* char '}')

transc2 = reference2 <|> text2

alot2 = many transc2

You may want to rewrite the beginning of reference2 using a helper like aux x y z = Outside (x,y,z).

EDIT: Changed text to deal with inputs that don't end with an Outside.

Anthony
  • 3,771
  • 23
  • 16
  • Thank you. Your version unfortunately produces an error, that initially led to the ugly hack in "text". It will fail even on the string "some". This is what me forced to put the "eof"-check into the "text"-parser and the empty Inside "" for typing purposes. – Secoe Oct 21 '11 at 19:21
  • @Sec Oe sorry about that! I had thought that `try` would deal with the `eof` itself. – Anthony Oct 21 '11 at 20:18