I'm writing a parser to parse huge chunks of English text using attoparsec. Everything has been great so far, except for parsing this char "――"
. I know it is just 2 dashes together "--"
. The weird thing is, the parser catches it in this code:
wordSeparator :: Parser ()
wordSeparator = many1 (space <|> satisfy (inClass "――?!,:")) >> pure ()
but not in this case:
specialChars = ['――', '?', '!', ',', ':']
wordSeparator :: Parser ()
wordSeparator = many1 (space <|> satisfy (inClass specialChars)) >> pure ()
The reason I'm using the list specialChars
is because I have a lot of characters to consider and I apply it multiple cases. And for the input consider: "I am ――Walt Whitman._"
and the output is supposed to be {"I", "am", "Walt", "Whiteman."}
I believe it's mostly because "――"
is not a Char? How do I fix this?