How to parse multi line XML in logstash?

Question

I have multiline XML files (~800 lines) in my s3 bucket and i want to index them in Elasticsearch but I can't parse them in logstash. Fields are sometimes empty so it's impossible to manually parse files.

My xml looks like:

<ServiceSalesClosed>
   <ErrorLevel>0</ErrorLevel>
   <ErrorMessage/>
   <LaborSaleCustomerPay>50.00</LaborSaleCustomerPay>`
   ...

In my input I have the config:

codec => multiline
{ 
pattern => "<ServiceSalesClosed.*"
what => next
}

In my filter the following config:

multiline { 
pattern => ["\t\t"]
what => next
}

You don't mention what it is that is causing you problems. I general, use the multiline codec or filter to make a single event, then pass it to the xml{} filter. — Alain Collins, Jan 15 '16 at 00:19
What's that second filter supposed to do? There's no sign of tabs in your file. But check your `_source` field in elasticsearch - does this contain your complete XML or not? (And if it doesn't, can you post a sample of what it _does_ contain?) — Sobrique, Jan 19 '16 at 19:39
I want lines matching a single event and I want every line to be a Json field — Antoine L., Jan 19 '16 at 19:42
'-source.message' doesn't contain my complet XML but only "\n 0" — Antoine L., Jan 19 '16 at 19:53
@Sobrique And in my console I have this mistake : "{:timestamp=>"2016-01-19T15:09:28.960000-0500", :message=>"Trouble parsing json", :source=>"message", :raw=>"", :exception=># — Antoine L., Jan 19 '16 at 20:13
You wouldn't use both the codec and filter. Your codec says: Anything that contains " — Alain Collins, Jan 19 '16 at 21:41

score 9 · Accepted Answer · answered Jan 20 '16 at 09:34

9

Ok, so it looks like the problem is, you've got confused about your multiline codec and your XML filter.

Can I suggest you set your multiline up:

codec => multiline {
     pattern => "<ServiceSalesClosed>" 
     negate => "true"
     what => "previous"
}

This will take any line that doesn't contain this tag, and keep it with the previous line(s). This should group your XML stanzas into parsable chunks. You should see the results of this in _source.

Then in your filter:

filter {
  xml => {
    source => "message"
    target => "xml_content"
    xpath => [ "//ErrorLevel", "error_level" ] 
  }
}

This should then parse your XML, create fields in the elasticsearch DB for "xml_content" (including your parsed XML) but also specifically extract ErrorLevel into a field of it's own.

answered Jan 20 '16 at 09:34

Sobrique

52,974
7
60
101

all my xml is in a tag " ; [...my events....] " I want to ignore this tag, I tried remove_tag but didn't work. Do you know how I can do it? @Sobrique – Antoine L. Jan 28 '16 at 18:51
1

That would be a separate question, I suggest you ask it as such. – Sobrique Jan 28 '16 at 18:53
1

I need to add `auto_flush_interval => 1` to process last event. – rjurado01 Dec 23 '16 at 12:03

How to parse multi line XML in logstash?

1 Answers1

Linked