0

I'm new to Logstash, trying to use it to parse a HTML log file. I need to output only the log lines, i.e. ignore preceding JS, CSS and HTML that are also included in the file. A log line in the file looks like this:

<tr bgcolor="tomato"><td>Jan 28<br>13:52:25.692</td><td>Jan 28<br>13:52:23.950</td><td>qtp114615276-1648 [POST] [call_id:-8009072655119858507]</td><td>REST</td><td>sa</td><td>0.0.0.0</td><td>ERR</td><td>ProjectValidator.validate(36)</td><td>Project does not exist</td></tr>

I have no problem getting all the lines, but I would like to have an output which contains only the relevant ones, without HTML tags, and looks something like that:

{
  "db_timestamp": "2015-01-28 13:52:25.692",
  "server_timestamp": "2015-01-28 13:52:25.950",
  "node": "qtp114615276-1648 [POST] [call_id:-8009072655119858507]",
  "thread": "REST",
  "user": "sa",
  "ip": "0.0.0.0",
  "level": "ERR",
  "method": "ProjectValidator.validate(36)",
  "message": "Project does not exist"
}

My Logstash configuration is:

input {
  file {
    type => "request"
    path => "<some path>/*.log"
    start_position => "beginning"
  }
  file {
    type => "log"
    path => "<some path>/*.html"
    start_position => "beginning"
  }
}
filter {
  if [type] == "log" {
    grok {
        match => [ WHAT SHOULD I PUT HERE??? ]  
    }
  }
}
output {
  stdout {}
  if [type] == "request" {
    http {
        http_method => "post"
        url => "http://<some url>"
        mapping =>  ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
    }
  }
  if [type] == "log" {
    http {
        http_method => "post"
        url => "http://<some url>"
        mapping =>  [ ALSO WHAT SHOULD I PUT HERE??? ]
    }
  }
}

Is there a way to do that? So far I haven't found any relevant documentation or samples.

Thanks!

burgi
  • 275
  • 3
  • 14

2 Answers2

0

Finally figured out the answer.

Not sure this is the best or most elegant solution, but it works.

I changed the http output format to "message", which enabled me to override and format the whole message as JSON, instead of using mapping. Also, found out how to name parameters in the grok filter and use them in the output.

This is the new Logstash configuration file:

input {
  file {
    type => "request"
    path => "<some path>/*.log"
    start_position => "beginning"
  }
  file {
    type => "log"
    path => "<some path>/*.html"
    start_position => "beginning"
  }
}

filter {
  if [type] == "log" {
    grok {
            match => { "message" => "<tr bgcolor=.*><td>%{MONTH:db_date}%{SPACE}%{MONTHDAY:db_date}<br>%{TIME:db_date}</td><td>%{MONTH:alm_date}%{SPACE}%{MONTHDAY:alm_date}<br>%{TIME:alm_date}</td><td>%{DATA:thread}</td><td>%{DATA:req_type}</td><td>%{DATA:username}</td><td>%{IP:ip}</td><td>%{DATA:level}</td><td>%{DATA:method}</td><td>%{DATA:err_message}</td></tr>" }
    }
  }
}

output { stdout { codec => rubydebug }
  if [type] == "request" {
    http {
        http_method => "post"
        url => "http://<some URL>"
        mapping =>  ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
    }
  }
  if [type] == "log" {
    http {
        format => "message"
        content_type => "application/json"
        http_method => "post"
        url => "http://<some URL>"
        message=> '{
            "db_date":"%{db_date}", 
            "alm_date":"%{alm_date}", 
            "thread": "%{thread}", 
            "req_type": "%{req_type}", 
            "username": "%{username}", 
            "ip": "%{ip}",
            "level": "%{level}",
            "method": "%{method}",
            "message": "%{err_message}"         
        }'
    }
  }
}

Note the single quote for the http message block and the double quotes for the parameters inside this block.

burgi
  • 275
  • 3
  • 14
0

For anyone parsing HP ALM logs, the following Logstash filter will do the work:

   grok {
        break_on_match => true
        match => [ "message", "<tr bgcolor=.*><td>%{MONTH:db_date_mon}%{SPACE}%{MONTHDAY:db_date_day}<br>%{TIME:db_date_time}<\/td><td>%{MONTH:alm_date_mon}%{SPACE}%{MONTHDAY:alm_date_day}<br>%{TIME:alm_date_time}<\/td><td>(?<thread_col1>.*?)<\/td><td>(?<request_type>.*?)<\/td><td>(?<login>.*?)<\/td><td>(?<ip>.*?)<\/td><td>(?<level>.*?)<\/td><td>(?<method>.*?)<\/td><td>(?m:(?<log_message>.*?))</td></tr>" ]
        }
    mutate {
        add_field => ["db_date", "%{db_date_mon} %{db_date_day}"] 
        add_field => ["alm_date", "%{alm_date_mon} %{alm_date_day}"]
        remove_field => [ "db_date_mon", "db_date_day", "alm_date_mon", "alm_date_day"  ]            
        gsub => [
           "log_message", "<br>", "
           "
           ]
        gsub => [
           "log_message", "<p>", "   "
           ]

        }

Tested and working fine with Logstash 2.4.0