3

I am stuck writing an attoparsec parser to parse what the Uniform Code for Units of Measure calls a <ATOM-SYMBOL>. It's defined to be the longest sequence of characters in a certain class (that class includes all the digits 0-9) which doesn't end with a digit.

So given the input foo27 I want to consume and return foo, for 237bar26 I want to consume and return 237bar, for 19 I want to fail without consuming anything.

I can't figure out how to build this out of takeWhile1 or takeTill or scan but I am probably missing something obvious.

Update: My best attempt so far was that I managed to exclude sequences that are entirely digits

atomSymbol :: Parser Text
atomSymbol = do
               r <- core
               if (P.all (inClass "0-9") . T.unpack $ r)
                 then fail "Expected an atom symbol but all characters were digits."
                 else return r
  where
    core = A.takeWhile1 $ inClass "!#-'*,0-<>-Z\\^-z|~"

I tried changing that to test if the last character was a digit instead of if they all were, but it doesn't seem to backtrack one character at a time.

Update 2:

The whole file is at https://github.com/dmcclean/dimensional-attoparsec/blob/master/src/Numeric/Units/Dimensional/Parsing/Attoparsec.hs. This only builds against the prefixes branch from https://github.com/dmcclean/dimensional.

Doug McClean
  • 14,265
  • 6
  • 48
  • 70
  • 2
    Here's a combinator that might help you: `notFollowedBy p = p >> fail "not followed by"` – arrowd Dec 04 '15 at 08:30
  • Please make your question self-contained and add your import statements. Here, one is left wondering (even though one might guess) what qualified imports `A`, `P`, and `T` correspond to. – jub0bs Dec 05 '15 at 10:12
  • 1
    Could you take the whole token, reverse it, drop digits, reverse back and check if anything has left? – VlatkoB Dec 06 '15 at 08:37

1 Answers1

1

You should reformulate the problem and treat spans of digits (0-9) and spans of non-digit characters (!#-'*,:-<>-Z\\^-z|~) separately. The syntactic element of interest can then be described as

  • an optional digit span, followed by
  • a non-digit span, followed by
  • zero or more {digit span followed by a non-digit span}.
{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Applicative ((<|>), many)
import Data.Char (isDigit)

import Data.Attoparsec.Combinator (option)
import Data.Attoparsec.Text (Parser)
import qualified Data.Attoparsec.Text as A
import Data.Text (Text)
import qualified Data.Text as T

atomSymbol :: Parser Text
atomSymbol = f <$> (option "" digitSpan)
               <*> (nonDigitSpan <|> fail errorMsg)
               <*> many (g <$> digitSpan <*> nonDigitSpan)
  where
    nonDigitSpan = A.takeWhile1 $ A.inClass "!#-'*,:-<>-Z\\^-z|~"
    digitSpan    = A.takeWhile1 isDigit
    f x y xss    = T.concat $ x : y : concat xss
    g x y        = [x,y]
    errorMsg     = "Expected an atom symbol but all characters (if any) were digits."

Tests

[...] given the input foo27 I want to consume and return foo, for 237bar26 I want to consume and return 237bar, for 19 I want to fail without consuming anything.

λ> A.parseOnly atomSymbol "foo26"
Right "foo"

λ> A.parseOnly atomSymbol "237bar26"
Right "237bar"

λ> A.parseOnly atomSymbol "19"
Left "Failed reading: Expected an atom symbol but all characters (if any) were digits."
jub0bs
  • 60,866
  • 25
  • 183
  • 186