2

I'm reading a xml formated input and I'm trying to extract each row of a html table as a separate event.

For example if my input is :

<xml> <table> <tr> <td> 1 </td> <td> 2 </td> </tr> <tr> <td> 3 </td> <td> 4 </td> </tr> </table> </xml>

I want the output to be :

{
       "message" => "<tr> <td> 1 </td> <td> 2 </td> </tr>",
      "@version" => "1",
    "@timestamp" => "2015-03-20T10:30:38.234Z",
          "host" => "VirtualBox"
}
{
       "message" => "<tr> <td> 3 </td> <td> 4 </td> </tr>",
      "@version" => "1",
    "@timestamp" => "2015-03-20T10:30:38.234Z",
          "host" => "VirtualBox"
}

The problem is I need to split an event into multiple event. Using the split filter didn't work because it removes the string used as "terminator".

I designed a custom grok pattern to extract the content of a html row : (?<data><tr>(.)*?</tr>)

Unfortunatly, this pattern only detects first occurrence and while there's a finite number of occurrences in a single xml, the number of rows is not known in advance.

Having a look at JIRA-703 on logstash website I'm afraid grok can not find a single pattern multiple times.(for now, Mars 2015)

Am I forced to code my own custom filter ? Is it possible to store each match of a grok filter as a new event ?

You can have a look at my filter

    input {
        stdin { }
    }

    filter {
        mutate {
            gsub => ["message", "<tr>", "[split]<tr>"]
        }
        mutate {
            gsub => ["message", "</tr>", "</tr>[split]"]
        }
        split {
            terminator => "[split]"
        }
        grok {
            patterns_dir => "../patterns"
            #voir pourquoi le meme pattern plusieurs fois ne fonctionne pas
            #https://logstash.jira.com/browse/LOGSTASH-703
            match => ["message", "%{HTML_ROW_LINE:data}" ]
        }
    }

    output {
        stdout {
            codec => rubydebug
        }
    }

I find that when I split the event before and after the line, the grok filter seems to not work anymore. I indeed retrieve what I want in the "message" field but no longer in the "data" field as wanted.

The strange thing is that I don't get a "_grokparsefailure" tag while I don't get a data field. This seems to indicate that there actually is a match, but it's not stored in a field.

vdolez
  • 977
  • 1
  • 14
  • 33
  • You need a filter that creates new events, which split{} will do. Can you add the terminator string back on with mutate->gsub{} ? – Alain Collins Mar 22 '15 at 04:29
  • I was indeed exploring the mutate->gsub methods. But since I split my events, I think the grok filter don't match anymore. Thanks for your comment anyway :) – vdolez Mar 23 '15 at 09:10
  • Once you run the split{}, all filters after that should be applied. – Alain Collins Mar 23 '15 at 15:05
  • I found a JIRA talking about a bug of the Split filter : https://logstash.jira.com/browse/LOGSTASH-1312 It will apparently be fixed in the next version (1.5). For now I think I'll just create a temporary output file that will be use as input. – vdolez Mar 23 '15 at 15:28

0 Answers0