0

In my project I provide api for a mobile app , and in every api the front end use session_id to mark user authenticity, and in the server side accept and validate it.

Recently we want to use ELK(elasticsearch, logstash, kibana) to preserve and analyze web server access log to extract some commonly occurred user activities. I encountered some problems, I wanna change session_id in the log to user_id(in program I can get user_id from session_id through query database) but I just don't know how?

Can logstash's filter do this? or should I change data when log was indexed in elasticsearch?

Kevin Yan
  • 1,236
  • 11
  • 19
  • If I understand you correctly, you don't have a `user_id` field in your logs and you want to add one? Do you have some kind of api from which you can retrieve the `user_id` (e.g. a rest api)? – hurb Aug 11 '15 at 09:51
  • the log lines is something like this `POST /wiki/wiki_list?session_id=032ce12841c7d0378be12f18f8e030f404a91c76&tag_id=1280 HTTP/1.1` In program I retrieve `session_id` from post fields and use this to find the corresponding `user_id` – Kevin Yan Aug 11 '15 at 12:41
  • Okay, and is it possible to do that outside the program? I.e. is there an api which receives the `session_id` and sends back the `user_id`? You will need an interface from which you can gather the `user_id` information. – hurb Aug 11 '15 at 13:11
  • It's now have a common function in my program to get corresponding `user_id` , anyway I can write a separate interface to do this outside the program. – Kevin Yan Aug 11 '15 at 14:57
  • Suppose that you write an api which converts your `session_id` into a `user_id` (e.g. via http post) you could use a logstash filter which calls this api automatically and adds your `user_id` to each event. Let me know if you need help with the filter. But first, you need to set up your converting function to be externally available. – hurb Aug 12 '15 at 09:44
  • Yeah a filter to convert `session_id` into `user_id` is what I really want to express about my confusion. Cause I'm very new on `logstash` so I really need some help to figure out how to use `logstash` filter to convert this. Thanks to mush for your analyses about my question . Could you give some direction? – Kevin Yan Aug 12 '15 at 14:32

1 Answers1

1

Alright, I try to give you an answer assuming that you have some kind of interface from which you can retrieve the user_id. Actually you need to do two things:

  1. Split your log line into separate fields to have a field which contains your session_id
  2. Get the corresponding user_id using some kind of api

Split your log line

You need to split your input into separate fields. This could be done with filters like grok and/or kv. Take a look at some SO questions to find a matching grok pattern or use the grok debugger. Please provide a few log lines if you need help with that.

EDIT: For your given examples your configuration should look something like this:

filter {
    grok {
        match => [ 'message', '"%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:"(?:%{URI:referrer}|-)"|%{QS:referrer}) %{QS:agent} %{QS:xforwardedfor}' ]
    }
    kv {
        field_split => "&?"
    } 
}

Please try it and adjust it yourself to get the session_id.

Once you have a field called session_id you can go on with step 2.

Get the user_id

As you have already mentioned you need a filter plugin because the session_id must be available. There are several official plugins but I think none of them suits your purpose. Since the session_id is assigned dynamically you cannot use a static translate filter or something like that.

It depends on your api but one possible approach is to get the corresponding user_id via http requests. For that purpose you could use a community plugin. For example logstash-filter-rest with a config like this:

filter {
    rest {
        url => "http://yourserver/getUserBySessionId/"
        sprintf => true
        method => "post"
        params => {                      
            "session_id" => "%{session_id}"        
        }
        response_key => "user_id"
    }
}
Community
  • 1
  • 1
hurb
  • 2,177
  • 3
  • 18
  • 32
  • Thank you for your insights and warmly help, the log lines are like below: `"POST /memo_set/create?course_id=15&native_set_id=1439437893&set_name=%E6%97%A5%E5%B8%B8%E6%98%93%E9%94%99100%E5%AD%97%EF%BC%88%E6%89%BE%E9%94%99%E5%AD%97%EF%BC%89&create_time=1439437893&last_edit_time=1439437893&session_id=2647fc9fb9fe9c569019775904aca835e45daa7f HTTP/1.1" 200 129 "-" "Dalvik/2.1.0 (Linux; U; Android 5.0; vivo X5Pro D Build/LRX21M)" "-"` – Kevin Yan Aug 13 '15 at 04:25
  • See my edit. Should be a good start. But please try to figure out further steps yourself. It is not that complicated ;) – hurb Aug 13 '15 at 09:32
  • Yeah, I will consult you if I have confusion when using that community plugin you recommended above . Thank you for your warmly help agin, have a good day :-) – Kevin Yan Aug 13 '15 at 10:04
  • Hi hurb, do you have any time? I had a problem on logstash-filter-rest, and you can see my description on this question page [link](http://stackoverflow.com/questions/33250615/logstash-filter-rest-sent-field-references-incorrectly-it-always-reference-first) – Kevin Yan Oct 21 '15 at 09:02