I would like to use rsyslog to retrieve apache log and process them using Logstash
Log are well received in rsyslog, and then in logstash, but I would like to extract the content of the apache logfile from the message part of rsyslog.
For instance, here is the line received in logstash. The last part is the apache log.
2015-09-20T16:27:30.000Z 1.1.20.133 <173>Sep 20 16:27:30 ip-12-1-8-7 apache[26914]: 10.25.52.66 - - [20/Sep/2015:16:27:30 +0000] "GET / HTTP/1.1" 200 - "-" "Dalvik/1.6.0 (Linux; U; Android 4.2.2; MID Build/JDQ39)" "-"
I would like to extract the apache part and then parse it again.
10.25.52.66 - - [20/Sep/2015:16:27:30 +0000] "GET / HTTP/1.1" 200 - "-" "Dalvik/1.6.0 (Linux; U; Android 4.2.2; MID Build/JDQ39)" "-"
How to do this using grok I guess. Is it possible to do a first filter using grok to identify syslog, extract syslog message, and then parse it as an apache log.
The filter used to extract the rsyslog is the following:
filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
}
}
Now, how can I use syslog_message to extract apache data. Do I need to do a single grok match command , or can I do this in two step: extract syslog data, and filter apache lines using grok/
The followings works, but I was wondering is there is something better to avoid duplication:
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message} ${COMBINEDAPACHELOG}" }
}
}
}