2

I'm running a very standard elk server to parse my python applications logs. I set up python to output the logs in json with the log message string in a field 'msg'. This has been working really well for me, but someone one accidentally spammed the logs last night with a dictionary passed directly to the message field. Because not much else was being logged last night the first 'msg' the new index saw was parsed as a object. Now all the properly formatted log messages are being rejected with the error:

"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [msg] tried to parse field [msg] as object, but found a concrete value"}}}, :level=>:warn}

I understand that 1 elasticsearch can't handle both objects and strings in the same field. Does anyone know the best way to set the field type? Should this be done by mutating them with a logstash filter, by setting the elasticsearch mapping, or both? Or should pre-process the logs in python formatter to ensure the msg can't be parsed as json? All 3 options seem relatively straight forward, but I really don't understand the trade offs.

Any recommendations?

Community
  • 1
  • 1
TristanMatthews
  • 2,451
  • 4
  • 24
  • 34
  • 1
    Since you haven't gotten an answer yet, I'd be curious to hear if you settled on any particular approach? – PaulCapestany Dec 09 '15 at 15:47
  • I've been pre-processing in my python formatter, because I know that works. I tried setting my mapping to always treat some fields as strings, but I'm having trouble getting elasticsearch to pick up my template right when logstash makes a new index. – TristanMatthews Dec 09 '15 at 21:55

1 Answers1

0

Specifying the mapping is decidedly the best practice.

Specifying a "text" or "keyword" type would not only prevent the error that you saw, but would have other beneficial effects in performance.

I would recommend the logstash json_encode filter only if you knew the input was always json and for some reason didn't want it parsed into its constituents (for example, if it was very sparse that would be bad for performance).

Steven Ensslen
  • 1,164
  • 9
  • 21