2

The parser given at https://www.fpcomplete.com/school/starting-with-haskell/libraries-and-frameworks/text-manipulation/attoparsec appears to work, but it has a problem.

The code (repeated here) is:

{-# LANGUAGE OverloadedStrings #-}

-- This attoparsec module is intended for parsing text that is
-- represented using an 8-bit character set, e.g. ASCII or ISO-8859-15.
import Data.Attoparsec.Char8
import Data.Word

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

main :: IO ()
main = print $ parseOnly parseIP "131.45.68.123"

If the parser is input an invalid IP address such as "1000.1000.1000.1000", it does not fail, and returns a garbage result, due to coerced numeric conversion.

Is there a simple way to fix this? One way is to use a larger Word type like Word32 and check if the number is less than 256. However, even that probably returns garbage if the input is pathological (e.g. overflows Word32 as well). Converting to Integer appears to be an option, as it is unbounded, but again, an adversarial input could make the program run out of memory.

So what would a (hopefully elegant) parser that avoids these problems look like?

donatello
  • 5,727
  • 6
  • 32
  • 56
  • have you tried something like `unless (0 <= d1 && d1 <= 255) $ fail "d1 not in [0,255]"` `Parser` is a monad so fail and `unless` would work – epsilonhalbe Dec 15 '15 at 14:34
  • 1
    That would not work because the library would already have converted the string to a `Word8` and that would pass the condition. – donatello Dec 15 '15 at 15:00

2 Answers2

3

My understanding of your question is that you not only want to fail when the input number is too large, but also you don't want the parser to consume more input than is needed.

We can define a function to parse integers up to a maximum, failing otherwise:

import Data.Attoparsec.ByteString.Char8
import Data.Word
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import Control.Applicative
import Data.List (foldl')
import Control.Monad 

decimalMax :: Integral a => Integer -> Parser a 
decimalMax dMax = do  
  let numDigs = ceiling $ log (fromIntegral(dMax+1)) / log 10
      getVal = foldl' (\s d -> s*10+fromIntegral (d-48)) 0 . B.unpack
  val <- getVal <$> scan 0 (\n c -> 
          if n > numDigs || not (isDigit c) then Nothing else Just (n+1)) 
  if val <= dMax 
    then return $ fromIntegral val 
    else fail $ "decimalMax: parsed decimal exceeded" ++ show dMax

This function computes the number of digits in the maximum number, then simply consumes at most that many digits. Your parser for IP addresses is remains almost the same:

parseIP :: Parser IP
parseIP = IP <$> dd <*> dd <*> dd <*> dig where 
  dig = decimalMax 255
  dd = dig <* char '.' 

main :: IO ()
main = do
  print $ parseOnly parseIP "131.45.68.123"
  print $ parseOnly parseIP "1000.1000.1000.1000"
user2407038
  • 14,400
  • 3
  • 29
  • 42
  • Thank you, this solves the general problem of parsing a bounded integer too. You have a minor typo - `C8.scan` should just be `scan`. – donatello Dec 16 '15 at 07:26
1

For simple non-pathological inputs you could indeed just coerce to Word8 from Integer, which is arbitrary-precision and will never overflow:

byte :: Parser Word8
byte = do
    n <- (decimal :: Parser Integer)
    if n < 256 then return n 
               else fail $ "Byte Overflow: " ++ show n ++ " is greater than 255."

Now the modified program,

parseIP = do
    d1 <- byte
    char '.'
    d2 <- byte
    char '.'
    d3 <- byte
    char '.'
    d4 <- byte
    return $ IP d1 d2 d3 d4

should produce the requisite output.

If you wanted to handle people trying to DoS you by writing "1291293919818283309400919..." as a really long number, then I foresee a bit more work needed to validate that something is really that length, so that you scan at most three digits before failing immediately on the first char '.'.

The following seems to compile and work with an import qualified Data.ByteString as BS up top:

scan0to3digits :: Int -> Char -> Maybe Int
scan0to3digits  = scan 0 helper where
  helper n c 
    | n < 3 && isDigit c  = Just (n + 1)
    | otherwise           = Nothing

byte :: Parser Word8
byte = do
    raw <- scan 0 scan0to3digits
    let p = BS.foldl' (\sum w8 -> 10 * sum + fromIntegral w8 - 48) 0 raw
    if BS.length raw == 0 
      then fail "Expected one or more digits..."
      else if p > 255
        then fail $ "Byte Overflow: " ++ show n ++ " is greater than 255."
        else return (fromInteger p)
CR Drost
  • 9,637
  • 1
  • 25
  • 36