I am stuck writing an attoparsec parser to parse what the Uniform Code for Units of Measure calls a <ATOM-SYMBOL>
. It's defined to be the longest sequence of characters in a certain class (that class includes all the digits 0-9) which doesn't end with a digit.
So given the input foo27
I want to consume and return foo
, for 237bar26
I want to consume and return 237bar
, for 19
I want to fail without consuming anything.
I can't figure out how to build this out of takeWhile1
or takeTill
or scan
but I am probably missing something obvious.
Update: My best attempt so far was that I managed to exclude sequences that are entirely digits
atomSymbol :: Parser Text
atomSymbol = do
r <- core
if (P.all (inClass "0-9") . T.unpack $ r)
then fail "Expected an atom symbol but all characters were digits."
else return r
where
core = A.takeWhile1 $ inClass "!#-'*,0-<>-Z\\^-z|~"
I tried changing that to test if the last character was a digit instead of if they all were, but it doesn't seem to backtrack one character at a time.
Update 2:
The whole file is at https://github.com/dmcclean/dimensional-attoparsec/blob/master/src/Numeric/Units/Dimensional/Parsing/Attoparsec.hs. This only builds against the prefixes
branch from https://github.com/dmcclean/dimensional.