I have an issue with Elasticsearch Logstash. My objective is to send automatically logs into elasticsearch with logstash.
My raw logs looks like that :
2016-09-01T10:58:41+02:00 INFO (6): 165.225.76.76 entreprise1 email1@gmail.com POST /application/controller/action Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko {"getid":"1"} 86rkt2dqsdze5if1bqldfl1
2016-09-01T10:58:41+02:00 INFO (6): 165.225.76.76 entreprise2 email2@gmail.com POST /application/controller2/action2 Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko {"getid":"2"} 86rkt2rgdgdfgdfgeqldfl1
2016-09-01T10:58:41+02:00 INFO (6): 165.225.76.76 entreprise3 email3@gmail.com POST /application/controller2/action2 Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko {"getid":"2"}
The problem is that I don't want to insert my logs in this form. I want to use a preprocessing script in python, in order to transform my data before injecting into Elastic with logstash.
At start, I wanted to do logging into elasticsearch only using python script. But I have huge quantity of logs separated in a lot of folders and files, constantly updated so I think it is way more powerfull to use logstash or filebeat. I was trying with filebeat and gork filter (not enough for my case) but I think its impossible to use a preprocess script before logging.
Logs should look like that at the end of the python script:
{"page": "/application/controller/action", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action", "client": "entreprise1", "email": "email1@gmail.com", "feature": "application_controller_action", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller", "application": "application"}
{"page": "/application/controller2/action2", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action2", "client": "entreprise2", "email": "email2@gmail.com", "feature": "application_controller2_action2", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller2", "application": "application"}
{"page": "/application3/controller/action3", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action3", "client": "entreprise3", "email": "email3@gmail.com", "feature": "application_controller3_action3", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller3", "application": "application"}
I'm struggling with the implementation of python script in logstash filter. I know it's something that can be implemented, but basically it's done with ruby script (cf : https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html)
1) Do you think it is possible to solve my problem using logstash ?
2) If yes, my python script should take a raw log line as input and the json formatted line as output ?
3) When a line of log is added into a log file, the entire file is reinsert each time, how can I handle this ?
4) Do you think it's possible to do it with filebeat ? And according to you, what is be the best for my case ?
For now, my configuration logstash file looks like this:
input {
file {
path => "/logpath/logs/*/*.txt"
start_position => "beginning"
}
}
filter {
# Here is where I should use my script to transform my logs into my json needed format
date {
match => ["time", "YYYY-MM-dd HH:mm:ss" ]
}
geoip {
source => "ip"
target => "geoip"
}
}
output {
stdout {
codec => dots {}
}
elasticsearch {
index => "logs_index"
document_type => "logs"
template => "./logs_template.json"
template_name => "logs_test"
template_overwrite => true
}
}
I really want to thanks in advance any people that will help me out and consider my request.
Dimitri
PS: Sorry for the syntax, english is not my main language.