I am processing a syslog logfile, each line as an individual syslog entry, and parsing that entry using a Attoparsec parser. So I am using
fileToBS :: IO Handle -> C.Source (ResourceT IO) BS.ByteString
fileToBS handleMaker = source C.$= bsSplitterConduit
where source = CB.sourceIOHandle handleMaker
bsSplitterConduit = CB.lines
to generate the stream of syslog entries. I am using
parseToLogData:: C.Conduit BS.ByteString (ResourceT IO) (Either CATT.ParseError (CATT.PositionRange, LogData))
parseToLogData = CATT.conduitParserEither syslogParser
to convert those bytestrings in to syslog values. Syslog values are generated from this parser (with some type synonyms of my own):
syslogParser :: Parser (Priority, Maybe UTCTime, IPAddress, BS.ByteString)
syslogParser = do
pri <- priority <?> "priority parse error"
mbDate <- date <?> "date parse error"
space
srcAddr <- ip
space
msg <- ATT.takeByteString
return LogData{pri = pri, timestamp = mbDate, source = srcAddr, message = "msg"}
priority :: Parser Priority
priority = do
string "<"
digitsString <- takeWhile1 digit
string ">"
return (RawPriority digitsString)
date :: Parser (Maybe UTCTime)
date = do
rawDate <- ATT.take 15
let stringDate = BS.unpack rawDate
let parsedDate = parseTime defaultTimeLocale syslogDateFormat stringDate
return parsedDate
ip :: Parser IPAddress
ip = do
oct0 <- takeWhile1 digit
period
oct1 <- takeWhile1 digit
period
oct2 <- takeWhile1 digit
period
oct3 <- takeWhile1 digit
return (oct0, oct1, oct2, oct3)
--ip = takeWhile1 (\x -> digit x || x == 46)
space = string " "
colon = string ":"
period = string "."
digit test = (test >= 48 && test <= 57)
octet = digit
The issue is the line which takes all the rest of the syslog entry (msg <- ATT.takeByteString
). This function does not play nice with streams because it needs a termination signal if using a resumable parser (which is what conduit's attoparsec library uses).
I have tried to yield empty bytestrings to fix this behavior but it is not working as expected (see incremental input on https://hackage.haskell.org/package/attoparsec-0.12.1.2/docs/Data-Attoparsec-ByteString.html ). It consumes the entirety of the syslog input file in to one parsed value. This is a 80MB test file so after the initial field extraction it is putting all the subsequent syslog messages in to the message field of syslog value.
Here is my terminator conduit to try and signal "atomic message" behavior. I am not sure why it does not work.
terminator :: C.Conduit BS.ByteString (ResourceT IO) BS.ByteString
terminator = C.awaitForever yieldAndAddTerminator
where
yieldAndAddTerminator bs = do
C.yield bs
C.yield terminator
terminator = ""
How can I treat UDP messages as atomic pieces of data in the conduit world?
A copy of this codebase can be found here: https://github.com/tureus/safe-forwarder .