3

I am trying to parse tomcat logs and pass output to elastic search. More or less it's working well. When I am seeing elastic search indexed data its containing lots of matched data having tags field as _grokparsefailure. This is causing lot of duplicate matched data. To avoid this I have tried to drop event if tags contain _grokparsefailure. This configuration is written in logstash.conf file below grok filter. Still output to elastic search contains indexed doc containing tags with _grokparsefailure. If grok fails I don't want that match to go to elastic search as its causing duplicate data in elastic search.

logstash.conf file is:

input {

  file {

    path => "/opt/elasticSearch/logstash-1.4.2/input.log"
        codec => multiline {
                pattern => "^\["
                negate => true
                what => previous
        }
        start_position => "end"

  }

}

filter {

        grok {

    match => [
"message", "^\[%{GREEDYDATA}\] %{GREEDYDATA} Searching hotels for country %{GREEDYDATA:country}, city %{GREEDYDATA:city}, checkin %{GREEDYDATA:checkin}, checkout %{GREEDYDATA:checkout}, roomstay %{GREEDYDATA:roomstay}, No. of hotels returned is %{NUMBER:hotelcount} ."
    ]

  }

 if "_grokparsefailure"  in [tags]{     

        drop { }

    }

}

output {

file {
   path => "/opt/elasticSearch/logstash-1.4.2/output.log"
 }

 elasticsearch {
                cluster => "elasticsearchdev"
  }

}

elastic search response http://172.16.37.97:9200/logstash-2015.12.23/_search?pretty=true

Given below output contains three docs where first contains _grokparsefailure in _source -> tags field.

I don't want it in this output. So probably need to restrict it from logstash so that it does not come to elastic search.

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [

 {

      "_index" : "logstash-2015.12.23",
      "_type" : "logs",
      "_id" : "J6CoEhKaSE68llz5nEbQSQ",
      "_score" : 1.0,
      "_source":{"message":"[2015-12-23 12:08:40,124] ERROR http-80-5_@{AF3AF784EC08D112D5D6FC92C78B5161,127.0.0.1,1450852688060} com.mmt.hotels.web.controllers.search.HotelsSearchController - Searching hotels for country IN, city DEL, checkin 28-03-2016, checkout 29-03-2016, roomstay 1e0e, No. of hotels returned is 6677 .","@version":"1","@timestamp":"2015-12-23T14:17:03.436Z","host":"ggn-37-97","path":"/opt/elasticSearch/logstash-1.4.2/input.log","tags":["_grokparsefailure"]}

    },

 {

      "_index" : "logstash-2015.12.23",
      "_type" : "logs",
      "_id" : "2XMc6nmnQJ-Bi8vxigyG8Q",
      "_score" : 1.0,
      "_source":{"@timestamp":"2015-12-23T14:17:02.894Z","message":"[2015-12-23 12:08:40,124] ERROR http-80-5_@{AF3AF784EC08D112D5D6FC92C78B5161,127.0.0.1,1450852688060} com.mmt.hotels.web.controllers.search.HotelsSearchController - Searching hotels for country IN, city DEL, checkin 28-03-2016, checkout 29-03-2016, roomstay 1e0e, No. of hotels returned is 6677 .","@version":"1","host":"ggn-37-97","path":"/opt/elasticSearch/logstash-1.4.2/input.log","country":"IN","city":"DEL","checkin":"28-03-2016","checkout":"29-03-2016","roomstay":"1e0e","hotelcount":"6677"}

},

 {

      "_index" : "logstash-2015.12.23",
      "_type" : "logs",
      "_id" : "fKLqw1LJR1q9YDG2yudRDw",
      "_score" : 1.0,
      "_source":{"@timestamp":"2015-12-23T14:16:12.684Z","message":"[2015-12-23 12:08:40,124] ERROR http-80-5_@{AF3AF784EC08D112D5D6FC92C78B5161,127.0.0.1,1450852688060} com.mmt.hotels.web.controllers.search.HotelsSearchController - Searching hotels for country IN, city DEL, checkin 28-03-2016, checkout 29-03-2016, roomstay 1e0e, No. of hotels returned is 6677 .","@version":"1","host":"ggn-37-97","path":"/opt/elasticSearch/logstash-1.4.2/input.log","country":"IN","city":"DEL","checkin":"28-03-2016","checkout":"29-03-2016","roomstay":"1e0e","hotelcount":"6677"}

    } ]
  }
}

]

Kamil Naja
  • 6,267
  • 6
  • 33
  • 47
Dev Gosain
  • 690
  • 8
  • 15

2 Answers2

7

What you could try is to test for _grokparsefailure in the output section, like this:

output {
  if "_grokparsefailure" not in [tags] {
    file {
      path => "/opt/elasticSearch/logstash-1.4.2/output.log"
    }

    elasticsearch {
      cluster => "elasticsearchdev"
    }
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • Tried this but its not working. Still output has _grokparsefailure in tags. – Dev Gosain Dec 24 '15 at 06:48
  • Did you wipe your index before trying? i.e. are you sure you're looking at new documents and not old ones? – Val Dec 24 '15 at 06:49
  • Yes, I did. Used this command to delete indexes. curl -XDELETE 'http://localhost:9200/logstash-2015.12.24' – Dev Gosain Dec 24 '15 at 06:57
  • Ok, and can you make sure you're not looking at documents from past indices, i.e. `logstash-2015-12-23`, `logstash-2015-12-22`, etc – Val Dec 24 '15 at 07:00
  • Documents are for current day only. I amended my input.log with some dumy text to make sure its picking up latest changes. This dummy text is still coming in output. – Dev Gosain Dec 24 '15 at 07:08
  • I still have the same issue? Anyone found a solution for this? – Mika Andrianarijaona Jun 28 '16 at 05:18
1

Sometimes you might have mutiple grok filters and some of them might be expected to fail for some events but pass for rest, dropping the events based on _grokparsefailure will not solve the purpose.

example:

input
{
some input
}

filter
{
grok1 {extract ip to my_ip1}

grok2 {extract ip to my_ip2}

grok3 {extract ip to my_ip3}


}

output
{
  if "_grokparsefailure" not in [tags] { <-- This will not write to output if any single grok fails.
  some output
}
}

The solution I have here is to filter out based on some variable.any other better way out here??? example:

if "10." in ["ip1"] or "10." in ["ip2"] or "10." in ["ip3"]
{
 drop{}
}
NinjaSolid
  • 21
  • 4