1

I'm writing a parser for a logfile. One of the lines in the logfile lists the parameters of an HTTP request:

  Parameters: {"back"=>"true", "embed_key"=>"12affbbace", "action"=>"index", "ajax"=>"1", "controller"=>"heyzap", "embed"=>"1"}

I'm having trouble parsing this with Attoparsec. My basic idea is to parse and discard Parameters: {, then keep the text up to }. Then I'll parse that text into a list of (key, value) tuples. Here's what I've got so far:

parseParams :: Parser [(Text, Text)]
parseParams = do
    paramString <- "  Parameters: {" *> takeTill' (== '}')
    let params = splitOn ", " paramString
    -- I'm not sure how to apply parseParamPair to params here

parseParamPair :: Parser (Text, Text)
parseParamPair = do
    key <- parseKeyOrValue
    value <- string "=>" >> parseKeyOrValue
    return (key, value)
    where
        parseKeyOrValue :: Parser Text
        parseKeyOrValue = char '"' >> takeTill' (== '"')

takeTill' :: (Char -> Bool) -> Parser Text
takeTill' func = takeTill func <* skip func

How can I implement this? Should I be using Data.Attoparsec.Text.sepBy somehow?

MaxGabriel
  • 7,617
  • 4
  • 35
  • 82
  • 1
    I assume you mean `Data.Attoparsec.Text.sepBy`? – dfeuer Mar 16 '15 at 03:46
  • 1
    Also, Hoogle can't find `takeTill'`, and `takeTill` does not look like something you're really likely to want here. – dfeuer Mar 16 '15 at 03:49
  • Thanks, I did mean `Data.Attoparsec.Text.sepBy`. `takeTill'` is a helper function I made that does `takeTill` + also discards the next character (I added the source for that to the question). My line of thinking is: discard up to the first `{`. Then take the input up to the closing `}` (that's the `takeTill'` part). Then parse that in-between text into the (key, value) pairs. – MaxGabriel Mar 16 '15 at 05:28
  • 3
    Be careful with that strategy. What if I returned `Parameters: {"data" => "{}"}` ? – luqui Mar 16 '15 at 05:45
  • I think I'm protected against having `{}` inside the parameters because those characters need to be escaped in a URL (same with the comma and space), though I'm open to solutions that don't depend on that. – MaxGabriel Mar 16 '15 at 06:09
  • Oh darn, good call @luqui. I just checked and I found `@` signs inside the URL parameters, so that must mean that `{` `}` `,` and ` ` aren't escaped either well. I'll have to account for that. – MaxGabriel Mar 16 '15 at 06:19

0 Answers0