1

I have an issue with Elasticsearch Logstash. My objective is to send automatically logs into elasticsearch with logstash.

My raw logs looks like that :

2016-09-01T10:58:41+02:00 INFO (6):     165.225.76.76   entreprise1 email1@gmail.com    POST    /application/controller/action Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko    {"getid":"1"}   86rkt2dqsdze5if1bqldfl1
2016-09-01T10:58:41+02:00 INFO (6):     165.225.76.76   entreprise2 email2@gmail.com    POST    /application/controller2/action2    Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko   {"getid":"2"}   86rkt2rgdgdfgdfgeqldfl1
2016-09-01T10:58:41+02:00 INFO (6):     165.225.76.76   entreprise3 email3@gmail.com    POST    /application/controller2/action2    Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko   {"getid":"2"}

The problem is that I don't want to insert my logs in this form. I want to use a preprocessing script in python, in order to transform my data before injecting into Elastic with logstash.
At start, I wanted to do logging into elasticsearch only using python script. But I have huge quantity of logs separated in a lot of folders and files, constantly updated so I think it is way more powerfull to use logstash or filebeat. I was trying with filebeat and gork filter (not enough for my case) but I think its impossible to use a preprocess script before logging.

Logs should look like that at the end of the python script:

{"page": "/application/controller/action", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action", "client": "entreprise1", "email": "email1@gmail.com", "feature": "application_controller_action", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller", "application": "application"} 
{"page": "/application/controller2/action2", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action2", "client": "entreprise2", "email": "email2@gmail.com", "feature": "application_controller2_action2", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller2", "application": "application"} 
{"page": "/application3/controller/action3", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action3", "client": "entreprise3", "email": "email3@gmail.com", "feature": "application_controller3_action3", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller3", "application": "application"}

I'm struggling with the implementation of python script in logstash filter. I know it's something that can be implemented, but basically it's done with ruby script (cf : https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html)

1) Do you think it is possible to solve my problem using logstash ?

2) If yes, my python script should take a raw log line as input and the json formatted line as output ?

3) When a line of log is added into a log file, the entire file is reinsert each time, how can I handle this ?

4) Do you think it's possible to do it with filebeat ? And according to you, what is be the best for my case ?

For now, my configuration logstash file looks like this:

input {
  file {
    path => "/logpath/logs/*/*.txt"
    start_position => "beginning"
  }
}

filter {
  # Here is where I should use my script to transform my logs into my json needed format
  date {
    match => ["time", "YYYY-MM-dd HH:mm:ss" ]
  }

  geoip {
    source => "ip"
    target => "geoip"
  }


}

output {
  stdout  {
    codec => dots {}
  }

  elasticsearch {
    index => "logs_index"
    document_type => "logs"
    template => "./logs_template.json"
    template_name => "logs_test"
    template_overwrite => true
  }

}

I really want to thanks in advance any people that will help me out and consider my request.

Dimitri

PS: Sorry for the syntax, english is not my main language.

1 Answers1

0

The standard way to convert logs to json format is using grok,json filters in logstash configuration. and inorder to reduce load on Logstash to process logs filebeat can be used along with your configuration.

Hence, Best cofiguration which can solve this problem is filebeat->logstash->Elasticsearch stack.

You do not need the python script, instead use filebeat to catch all the logs from specific place and forward it to logstash.

Install filebeat on server where all the logs are getting accumulated, it will be good if you direct all logs in a specific folder. Install filebeat first and then setup configuration to forward logs to logstash

Here is the filebeat configuration:

filebeat:
  prospectors:
    -
      paths:
        - "*log_path_of_all_your_log_files*"
      input_type: log
      json.message_key: statement
      json.keys_under_root: true

  idle_timeout: 1s
  registry_file: /var/lib/filebeat/registry
output:

  logstash:
    hosts: ["*logstash-host-ip:5044*"]
    worker: 4
    bulk_max_size: 1024
shipper:
logging:
  files:
    rotateeverybytes: 10485760 # = 10MB
    level: debug

Now here, along with your logstash configuration, you need to have GROK filter to convert your logs to json format (make changes in logstash configuration file) and then forward it to elasticsearch kibana or wherever you want.

  • Thank you very much for your answer, I never thank about using both logstash and filebeat I'm gonna give it a try. My problem with gork is that I don't know how to make a custom field like the one I have called "feature". This field is a mix of 3 others fields attached with "_" (application_controller_action). Is it possible to do something like that ? I was already asking people to help me for the gork regex when I was trying using filebeat, here is my post (https://stackoverflow.com/questions/50816580/what-should-be-the-grok-pattern-for-thoses-logs-ingest-pipeline-for-filebeat) Thanks – denis jardot Jun 19 '18 at 11:52
  • I also wanted to know what is the interest of using both ? What logstash brings that filebeat does not ? And vice versa ? – denis jardot Jun 19 '18 at 11:56
  • In addition, only using gork is not enough in my case. Indeed, my python script is also used as a field cleaner. I have a lot of processing that modify the field "page" ("/application/controller/action") according to what I have in the "application" part, "controller" part and "action". And I think that this whole process can't be done in a gork filter.. – denis jardot Jun 19 '18 at 12:18
  • I understand that grok filter can be tricky but since you are using logstash, you should try and make the most of its features. That being said, yes it is possible to mix certain fields and combine them into one. I will go over the link provided by you and suggest changes. – Ayush Verma Jun 19 '18 at 12:31
  • It has been a debate for a some time now regarding logstash vs filebeat because at the end of the day both are forwarding logs. But it boils down to the performance of both. Filebeat is preferred over logstash for forwarding as it is lightweight, supports SSL and TLS encryption, supports back pressure with a good built-in recovery mechanism, and is extremely reliable. And logstash will then perform the operation of formatting logs forwarded by filebeat and display in kibana or any other GUI. – Ayush Verma Jun 19 '18 at 12:34
  • Well, I assume that your python script is doing an amazing job. But grok combined with other filters can perform similar task with much ease. Anyway, I have tried my best to point to you in the right direction and best stack that can be followed. you can explore further. – Ayush Verma Jun 19 '18 at 12:37