0

I would like to use rsyslog to retrieve apache log and process them using Logstash

Log are well received in rsyslog, and then in logstash, but I would like to extract the content of the apache logfile from the message part of rsyslog.

For instance, here is the line received in logstash. The last part is the apache log.

2015-09-20T16:27:30.000Z 1.1.20.133 <173>Sep 20 16:27:30 ip-12-1-8-7 apache[26914]: 10.25.52.66 - - [20/Sep/2015:16:27:30 +0000] "GET / HTTP/1.1" 200 - "-" "Dalvik/1.6.0 (Linux; U; Android 4.2.2; MID Build/JDQ39)" "-"

I would like to extract the apache part and then parse it again.

10.25.52.66 - - [20/Sep/2015:16:27:30 +0000] "GET / HTTP/1.1" 200 - "-" "Dalvik/1.6.0 (Linux; U; Android 4.2.2; MID Build/JDQ39)" "-"

How to do this using grok I guess. Is it possible to do a first filter using grok to identify syslog, extract syslog message, and then parse it as an apache log.

The filter used to extract the rsyslog is the following:

filter {
  grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
   }
}

Now, how can I use syslog_message to extract apache data. Do I need to do a single grok match command , or can I do this in two step: extract syslog data, and filter apache lines using grok/

The followings works, but I was wondering is there is something better to avoid duplication:

filter {
 if [type] == "syslog" {
   grok {
     match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
     add_field => [ "received_at", "%{@timestamp}" ]
     add_field => [ "received_from", "%{host}" ]
   }
   grok {
     match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}  ${COMBINEDAPACHELOG}" }
   }
  }
}
tomsoft
  • 4,448
  • 5
  • 28
  • 35

1 Answers1

1

You're very close!

In the second grok, you should use the syslog_message field as your input, and only the COMBINEDAPACHELOG as your pattern.

That's a good way to post-process a field with grok to extract more information from it, as you have done.

Since the log file will only ever have one format it in, you can also combine the two groks into one:

 match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{COMBINEDAPACHELOG}" }
Alain Collins
  • 16,268
  • 2
  • 32
  • 55
  • I was hoping that it was possible to add a second filter to syslog_message, to add others filters this way (log4j for instance), but anyway, works fine – tomsoft Sep 21 '15 at 06:25
  • In your pattern, syslog_message just matches the stuff at the end, which is the apache log info. As I mentioned, you can parse that new field with a separate grok (I'm a fan of stripping off the common stuff first), or skip the creation of the new field by putting the apache stuff at the end of the original grok. Hope that's more clear? – Alain Collins Sep 21 '15 at 07:47
  • Some info on common stuff: http://svops.com/blog/processing-common-event-information-with-grok/ – Alain Collins Sep 21 '15 at 07:48