The parser given at https://www.fpcomplete.com/school/starting-with-haskell/libraries-and-frameworks/text-manipulation/attoparsec appears to work, but it has a problem.
The code (repeated here) is:
{-# LANGUAGE OverloadedStrings #-}
-- This attoparsec module is intended for parsing text that is
-- represented using an 8-bit character set, e.g. ASCII or ISO-8859-15.
import Data.Attoparsec.Char8
import Data.Word
-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show
parseIP :: Parser IP
parseIP = do
d1 <- decimal
char '.'
d2 <- decimal
char '.'
d3 <- decimal
char '.'
d4 <- decimal
return $ IP d1 d2 d3 d4
main :: IO ()
main = print $ parseOnly parseIP "131.45.68.123"
If the parser is input an invalid IP address such as "1000.1000.1000.1000", it does not fail, and returns a garbage result, due to coerced numeric conversion.
Is there a simple way to fix this? One way is to use a larger Word
type like Word32
and check if the number is less than 256. However, even that probably returns garbage if the input is pathological (e.g. overflows Word32
as well). Converting to Integer
appears to be an option, as it is unbounded, but again, an adversarial input could make the program run out of memory.
So what would a (hopefully elegant) parser that avoids these problems look like?