0

I'm trying to parse a logfile, containing XML and other arbitrary output. In a specific case I want to check whether reservations have been successfully sent to the customer or not.

[11-28-51.440000] Sending reservation to customer
[11-28-51.492900] <?xml version="1.0" encoding="UTF-8"?><SendReservation><ReservationId>1289</ReservationId><Customer>2892</Customer>...</SendReservation>
[11-28-51.493000] Status: Successfull
[11-28-52.261000] Something different
[11-28-51.520000] Sending reservation to customer
[11-28-54.548900] <?xml version="1.0" encoding="UTF-8"?><SendReservation><ReservationId>2732</ReservationId><Customer>7856</Customer>...</SendReservation>
[11-28-54.600000] Status: Error: Reservation was rejected

Now with logstash I need to parse some fields of the reservation, including the ReservationId. For this I can use the logstash XML filter. However I have to combine this with the success/error-status, which is being printed after the XML output as normal text.

I try to use a multiline input:

input {
  file {
    path => "test.log"
    start_position => "beginning"
    type => "reservation"
    codec => multiline {
      pattern => "\[(.*?)\](.*?)<\?xml[^>]*>"
      negate => true
      what => previous
    }
  }
}

With that I will have a message in the logstash event:

"message" => "[11-28-51.492900] <?xml version="1.0" encoding="UTF-8"?><SendReservation><ReservationId>1289</ReservationId><Customer>2892</Customer>...</SendReservation>\n[11-28-51.493000] Status: Successfull\n[11-28-52.261000] Something different\n[11-28-51.520000] Sending reservation to customer

To be able to parse the XML with the XML filter I need to have a field as source which contains valid XML. Therefore I'm trying to cut away the timestamp before and everything after the xml.

    mutate {
        gsub => [ "message", "^(.*?)<\?xml[^>]*>", "" ]
    }
    mutate {
        gsub => [ "message", "(?<=<\/SendReservation>).*$", "" ]
    } 

At this point I see, that the regex-matching does only work in the first line of the message (before the first \n). Which means, that cutting away everything after the end tag will have no effect. This is my first problem, which might have something to do with multiline.

The second problem is, that I have no clue how to move the XML content, I try to cut out of 'message' into a new field, which I can use onwards in the XML filter as source field. I tried grok overwrite, but it requires an existing field and I have to create a new one.

So in conclusion, all I want is to create a head and tail field from my multiline message. Head would contain the first line with XML, holding the main information, and tail the rest with some additional information, which I have to relate.

Danny
  • 166
  • 1
  • 9

1 Answers1

1

Ok, thanks to https://regex101.com and http://grokconstructor.appspot.com I've found it myself

I have to use

grok { match => { "message" => "(?<head>(\[(.*?)\](.*?)<\?xml[^>]*>(.*?)<\/SendReservation>))+(?<tail>(?<=<\/SendReservation>)(.|\n)*$)" } }

Answer to first problem: I have to take the \n into account: ?<=<\/SendReservation>)(.|\n)*$

Answer to second problem: Logstash creates fields from all regex group names. In this case grok pattern (?<head>(regex_for_xml))+(?<tail>(regex_for_text)) will create a head and a tail field.

Danny
  • 166
  • 1
  • 9