2

Started to learn Haskell, I decided to get acquainted with Parsec, but there were problems. I'm trying to implement the parsing of the books in the format of FB2. On conventional tags ( text ) is good, but when the tag within a tag - does not work.

import Text.ParserCombinators.Parsec

data FB2Doc = Node String FB2Doc
            | InnText String
            deriving (Eq,Show)

parseFB2 :: GenParser Char st [FB2Doc]
parseFB2 = many test

test :: GenParser Char st FB2Doc
test = do name <- nodeStart
          value <- getvalue
          nodeEnd
          return $ Node name value

nodeStart = do char '<'
               name <- many (letter <|> digit <|> oneOf "-_")
               char '>'
               return name

nodeEnd = do string "</"
             many (letter <|> digit)
             char '>'
             spaces 
gettext = do x <- many (letter <|> digit <|> oneOf "-_")
             return $ InnText x 

getvalue = do (nodeStart >> test) <|> gettext <|> return (Node "" (InnText ""))
main = do
         print $ parse parseFB2 "" "<h1><a2>ge</a2></h1> <genre>history_russia</genre>"
demonplus
  • 5,613
  • 12
  • 49
  • 68
harungo
  • 219
  • 2
  • 11
  • 2
    I haven't done a real round of debugging, but the `nodeStart >> test` clause of `getvalue` looks a little fishy: it has a `nodeStart` not matched by a `nodeEnd`; it throws away the name of the node that's started; and, since `test` immediately calls `many`, it can never return an empty list of nodes. – Daniel Wagner Oct 09 '11 at 19:43
  • @FUZxxl, OP is doing this as a learning exercise, not, I presume, as a way to implement a production-quality XML parser. So "use a library that does it for you" is not really fulfilling that goal. – luqui Oct 09 '11 at 20:13
  • Could you provide more information about how it does not work. An error message, expected output, something. These things help direct our attention to important parts of your code. – luqui Oct 09 '11 at 20:14
  • @luqui I removed my comment... You are right. I forgot about that. Thanks. – fuz Oct 09 '11 at 20:17

1 Answers1

1

I think you want this:

getvalue = try test <|> gettext

The try is needed for empty nodes: "<bla></bla>". test will consume the '<' of </bla>, and the try allows for backtracking.

Sjoerd Visscher
  • 11,840
  • 2
  • 47
  • 59