5

I have a json in the form of

[
    {
        "foo":"bar"
    }
]

I am trying to filter it using the json filter in logstash. But it doesn't seem to work. I found that I can't parse list json using the json filter in logstash. Can someone please tell me about any workaround for this?

UPDATE

My logs

IP - - 0.000 0.000 [24/May/2015:06:51:13 +0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium+S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT%2B05%3A30&events=%5B%7B%22eV%22%3A%22com.olx.southasia%22%2C%22eC%22%3A%22appUpdate%22%2C%22eA%22%3A%22app_activated%22%2C%22eTz%22%3A%22GMT%2B05%3A30%22%2C%22eT%22%3A%221432386324909%22%2C%22eL%22%3A%22packageName%22%7D%5D * "-" "-" "-"

URL decoded version of the above log is

IP - - 0.000 0.000 [24/May/2015:06:51:13  0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT+05:30&events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}] * "-" "-" "-"

Please find below my config file for the above logs..

filter {

urldecode{
    field => "message"
}
 grok {
  match => ["message",'%{IP:clientip}%{GREEDYDATA} \[%{GREEDYDATA:timestamp}\] \*"%{WORD:method}%{GREEDYDATA}']
}

kv {
    field_split => "&? "
}
json{
    source=> "events"
}
geoip {
    source => "clientip"
}

}

I need to parse the events, ie events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}]

Keshav Agarwal
  • 811
  • 1
  • 10
  • 28

1 Answers1

8

I assume that you have your json in a file. You are right, you cannot use the json filter directly. You'll have to use the multiline codec and use the json filter afterwards.

The following config works for your given input. However, you might have to change it in order to properly separate your events. It depends on your needs and the json format of your file.

Logstash config:

input     {   
    file     {
        codec => multiline
        {
            pattern => "^\]" # Change to separate events
            negate => true
            what => previous               
        }
        path => ["/absolute/path/to/your/json/file"]
        start_position => "beginning"
        sincedb_path => "/dev/null" # This is just for testing
    }
}

filter     {
    mutate   {
            gsub => [ "message","\[",""]
            gsub => [ "message","\n",""]
        }
    json { source => message }
}

UPDATE

After your update I guess I've found the problem. Apparently you get a jsonparsefailure because of the square brackets. As a workaround you could manually remove them. Add the following mutate filter after your kv and before your json filter:

mutate  {
    gsub => [ "events","\]",""]
    gsub => [ "events","\[",""]
}

UPDATE 2

Alright, assuming your input looks like this:

[{"foo":"bar"},{"foo":"bar1"}]

Here are 4 options:

Option a) ugly gsub

An ugly workaround would be another gsub:

gsub => [ "event","\},\{",","]

But this would remove the inner relations so I guess you don't want to do that.

Option b) split

A better approach might be to use the split filter:

split {
    field => "event"
    terminator => ","
}
mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
   }
json{
    source=> "event"
}

This would generate multiple events. (First with foo = bar and second with foo1 = bar1.)

Option c) mutate split

You might want to have all the values in one logstash event. You could use the mutate => split filter to generate an array and parse the json if an entry exists. Unfortunately you will have to set a conditional for each entry because logstash doesn't support loops in its config.

mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
    split => [ "event", "," ]
   }

json{
    source=> "event[0]"
    target => "result[0]"
}

if 'event[1]' {
    json{
        source=> "event[1]"
        target => "result[1]"
    }
    if 'event[2]' {
        json{
            source=> "event[2]"
            target => "result[2]"
        }
    }
    # You would have to specify more conditionals if you expect even more dictionaries
}

Option d) Ruby

According to your comment I tried to find a ruby way. Following works (after your kv filter):

mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
}

ruby  {
    init => "require 'json'"
    code => "
        e = event['event'].split(',')
        ary = Array.new
        e.each do |x|
            hash = JSON.parse(x)
            hash.each do |key, value|
                ary.push( { key =>  value } )
            end
        end
        event['result'] = ary
    "
}

Option e) Ruby

Use this approach after your kv filter (without setting a mutate filter):

ruby  {
    init => "require 'json'"
    code => "
            event['result'] = JSON.parse(event['event'])
    "
}

It will parse events like event=[{"name":"Alex","address":"NewYork"},{"name":"David","address":"NewJersey"}]

into:

"result" => [
    [0] {
           "name" => "Alex",
        "address" => "NewYork"
    },
    [1] {
           "name" => "David",
        "address" => "NewJersey"
    }

Since the behavior of the kv filter this does not support whitespaces. I hope you don't have any in your real inputs, do you?

hurb
  • 2,177
  • 3
  • 18
  • 32
  • I don't have my json in another file, I have my json in the logs which are in this form `192.168.1.1 - - 0.421 0.0000 [24/May/2015:06:51:33 +0000] *"POST event=[{"foo":"bar"}]` This event is the json that I need to pass. There can also be multiple dictionaries in the event array. – Keshav Agarwal Aug 04 '15 at 08:06
  • What do you mean by *"in the logs"*? Which input do you use in logstash? Do the messages come through a network stream? How do you receive them? – hurb Aug 04 '15 at 08:30
  • Currently, I'm analyzing my nginx access logs. The input is of type `file`. The json string is received as a parameter, hence need to parse the same. – Keshav Agarwal Aug 04 '15 at 09:04
  • Okay... Do you already have an nginx-access grok pattern which parses your nginx-logs? You need to use the grok filter to get a field which contains your json string. It would be very helpful if you provide this information in your question. – hurb Aug 04 '15 at 09:17
  • I've tried this approach (workaround) earlier as well. However, this fails when you have list of dictionaries., ie `[{"foo":"bar"},{"foo":"bar1"}]`. – Keshav Agarwal Aug 04 '15 at 10:00
  • Is there any workaround for a json having list of dictionaries. I've been trying to find the same from past 2 days, but havent got anything. – Keshav Agarwal Aug 04 '15 at 10:01
  • I've however tried http://kapaski.github.io/blog/2014/07/24/logstash-to-parse-json-with-json-arrays-in-values/ , but im not able to customize it as per my own need (No experience with Ruby). Do you have any ither workaround? Thanks :) – Keshav Agarwal Aug 04 '15 at 10:22
  • Thanks @hurb for a great explanation and various options. However, I tried all the outputs, but didnt get the expected result. Here are the bugs which were received after trying given options. `Option a)` Breaking the inner relationships. `Option B)` Also breaking the inner relationships, since `events` field was spilt by `,`, therefore `,` inside the dictionary were also considered to get split. – Keshav Agarwal Aug 04 '15 at 12:20
  • In `Option C` and `Option D` , the keys were getting over-write.For eg:- If I had `events=[{'foo':'bar'},{'foo':'bar1'}]`, the final value of events['foo']` will be `bar1`. I rather want both the events, ie `foo` with value `bar` and `bar1`. – Keshav Agarwal Aug 04 '15 at 12:22
  • I've edited options c) and d) to suit your purpose. They won't overwrite the values anymore. – hurb Aug 04 '15 at 12:39
  • But would I then be able to show the array values on kibana? I mean if I've events as `events=[{'name':'Alex','address':'New York'},{'name':'David','address':'New Jersey'}]`. As per the updated code, logstash will return me `event['name'] => ['Alex','David']` and `event['address']=>'New york', 'New Jersey'`. Now, if I fire a query on Kibana to fetch me users living in `New York`, will it return `Alex` or not? – Keshav Agarwal Aug 04 '15 at 12:46
  • ie will there still remain mapping of event values? Thanks for your help :) – Keshav Agarwal Aug 04 '15 at 12:46
  • See option e). I think it is the cleanest approach so far. Hope this finally does what you want. – hurb Aug 04 '15 at 13:08
  • This is the best approach of doing this. Thanks a lot. :) However, when I tried to visualize the data in Kibana by creating a table, it never showed me the `name` or `address` field in the dropdown. – Keshav Agarwal Aug 04 '15 at 13:29
  • I think this is another problem regarding nested objects in elasticsearch and kibana. Google will give you a lot of issues about that topic on SO or GitHub. – hurb Aug 04 '15 at 13:40