0

I am processing a syslog logfile, each line as an individual syslog entry, and parsing that entry using a Attoparsec parser. So I am using

fileToBS :: IO Handle -> C.Source (ResourceT IO) BS.ByteString
fileToBS handleMaker = source C.$= bsSplitterConduit
  where source = CB.sourceIOHandle handleMaker
        bsSplitterConduit = CB.lines

to generate the stream of syslog entries. I am using

parseToLogData:: C.Conduit BS.ByteString (ResourceT IO) (Either CATT.ParseError (CATT.PositionRange, LogData))
parseToLogData = CATT.conduitParserEither syslogParser

to convert those bytestrings in to syslog values. Syslog values are generated from this parser (with some type synonyms of my own):

syslogParser :: Parser (Priority, Maybe UTCTime, IPAddress, BS.ByteString)
syslogParser = do
  pri <- priority <?> "priority parse error"
  mbDate <- date <?> "date parse error"
  space
  srcAddr <- ip
  space
  msg <- ATT.takeByteString
  return LogData{pri = pri, timestamp = mbDate, source = srcAddr, message = "msg"}

priority :: Parser Priority
priority = do
  string "<"
  digitsString <- takeWhile1 digit
  string ">"
  return (RawPriority digitsString)

date :: Parser (Maybe UTCTime)
date = do
  rawDate <- ATT.take 15
  let stringDate = BS.unpack rawDate
  let parsedDate = parseTime defaultTimeLocale syslogDateFormat stringDate
  return parsedDate

ip :: Parser IPAddress
ip = do
  oct0 <- takeWhile1 digit
  period
  oct1 <- takeWhile1 digit
  period
  oct2 <- takeWhile1 digit
  period
  oct3 <- takeWhile1 digit
  return (oct0, oct1, oct2, oct3)
--ip = takeWhile1 (\x -> digit x || x == 46)

space = string " "
colon = string ":"
period = string "."

digit test = (test >= 48 && test <= 57)
octet = digit

The issue is the line which takes all the rest of the syslog entry (msg <- ATT.takeByteString). This function does not play nice with streams because it needs a termination signal if using a resumable parser (which is what conduit's attoparsec library uses).

I have tried to yield empty bytestrings to fix this behavior but it is not working as expected (see incremental input on https://hackage.haskell.org/package/attoparsec-0.12.1.2/docs/Data-Attoparsec-ByteString.html ). It consumes the entirety of the syslog input file in to one parsed value. This is a 80MB test file so after the initial field extraction it is putting all the subsequent syslog messages in to the message field of syslog value.

Here is my terminator conduit to try and signal "atomic message" behavior. I am not sure why it does not work.

terminator :: C.Conduit BS.ByteString (ResourceT IO) BS.ByteString
terminator = C.awaitForever yieldAndAddTerminator
  where
    yieldAndAddTerminator bs = do
      C.yield bs
      C.yield terminator
    terminator = ""

How can I treat UDP messages as atomic pieces of data in the conduit world?

A copy of this codebase can be found here: https://github.com/tureus/safe-forwarder .

xrl
  • 2,155
  • 5
  • 26
  • 40

1 Answers1

1

You probably want to fuse your parseToLogData with a function that prevents it from consuming a new line (ASCII code 10). Using conduit-combinators terminology, something like:

takeWhileCE (/= 10) =$= parseToLogData
dropWhileCE (/= 10) >> dropCE 1 -- flush the rest of it

You may also want to look into the line combinator function.

Michael Snoyman
  • 31,100
  • 3
  • 48
  • 77
  • I thought `fileToBS` (which uses the Data.Conduit.List function `lines`) would would consume the newline. I do manage to break the input file in to discrete ByteStrings. The issue is these ByteStrings are being fed in to the same parser instance. In place of that behavior I want incomplete parses to continue as errors (`Left`) and subsequent `ByteString` values should be parsed independently. – xrl Jan 19 '15 at 17:52
  • In that case you need to limit to one chunk of input, which can be done with Data.Conduit.List.isolate. – Michael Snoyman Jan 19 '15 at 18:56
  • This does not appear to be feeding the ByteStrings down as inidividual chunks -- it seems to only consume one line then stop processing any more. `fileToBS openAction C.$= (CL.isolate 1) C.$= parseToLogData C.$$ printer` – xrl Jan 20 '15 at 19:46
  • 1
    Right. You'd need to do this in a loop to consume multiple lines. – Michael Snoyman Jan 21 '15 at 00:38