4

I'm writing a program to modify source code files. I need to parse the file (e.g. with megaparsec), modify its Abstract Syntax Tree AST (e.g. with Uniplate), and regenerate the file with as little changes as possible (e.g. preserving spaces, comments, ...).

So, the AST should contain the spaces, for example :

data Identifier = Identifier String String

where the first string is the name of the identifier, and the second is the spaces after it. The same applies for any symbol in the language.

How can I write a parser for Identifier ?

Pierre Carbonnelle
  • 2,305
  • 19
  • 25
  • 1
    I think this has been asked before, but don't know of a standard solution (I have once done this myself in a somewhat cumbersome way). – leftaroundabout Jun 30 '17 at 08:18

1 Answers1

2

I ended up writing parseLexeme, to replace lexeme in this tutorial

data Lexeme a = Lexeme a String -- String contains the spaces after the lexeme

whites :: Parser String
whites = many spaceChar

parseLexeme :: Parser a -> Parser (Lexeme a)
parseLexeme p = do
  value <- p
  w <- whites
  return $ Lexeme value w

instance PPrint a => PPrint (Lexeme a) where
  pprint (Lexeme value w) = (pprint value) ++ w

The parser for identifier becomes :

data Identifier = Identifier (Lexeme String)

parseIdentifier :: Parser Identifier
parseIdentifier = do
  v <- parseLexeme $ (:) <$> letterChar <*> many (alphaNumChar <|> char '_')
  return $ Identifier v

instance PPrint Identifier where
  pprint (Identifier l) = pprint l
Pierre Carbonnelle
  • 2,305
  • 19
  • 25