6

I am trying to filter kafka events from multiple topics, but once all events from one topic has been filtered logstash is not able to fetch events from the other kafka topic. I am using topics with 3 partitions and 2 replications Here is my logstash config file

input {
    kafka{              
        auto_offset_reset => "smallest"
        consumer_id => "logstashConsumer1"          
        topic_id => "unprocessed_log1"
        zk_connect=>"192.42.79.67:2181,192.41.85.48:2181,192.10.13.14:2181"
        type => "kafka_type_1"
}
kafka{              
    auto_offset_reset => "smallest"
    consumer_id => "logstashConsumer1"          
    topic_id => "unprocessed_log2"
    zk_connect => "192.42.79.67:2181,192.41.85.48:2181,192.10.13.14:2181"
    type => "kafka_type_2"
}
}
filter{
    if [type] == "kafka_type_1"{
    csv { 
        separator=>" "
        source => "data"        
    }   
}
if [type] == "kafka_type_2"{    
    csv { 
        separator => " "        
        source => "data"
    }
}
}
output{
    stdout{ codec=>rubydebug{metadata => true }}
}
Abhijeet
  • 139
  • 1
  • 2
  • 7

3 Answers3

6

Its a very late reply but if you wanted to take input multiple topic and output to another kafka multiple output, you can do something like this :

input {
  kafka {
    topics => ["topic1", "topic2"]
    codec => "json"
    bootstrap_servers => "kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092"
    decorate_events => true
    group_id => "logstash-multi-topic-consumers"
    consumer_threads => 5
  }
}
    
output {
   if [kafka][topic] == "topic1" {
     kafka {
       codec => "json"
       topic_id => "new_topic1"
       bootstrap_servers => "output-kafka-1:9092"
     }
   }
   else if [kafka][topic] == "topic2" {
      kafka {
       codec => "json"
       topic_id => "new_topic2"
       bootstrap_servers => "output-kafka-1:9092"
      }
    }
}

Be careful while detailing your bootstrap servers, give name on which your kafka has advertised listeners.

Ref-1: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-group_id

Ref-2: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-decorate_events

Aydin K.
  • 3,309
  • 36
  • 44
daemonsl
  • 442
  • 4
  • 6
  • Will this end up with 5 consumer threads per topic? Or 5 threads that read from both topics? Or 2 with one topic and 3 with another? – Jim Hoagland Aug 02 '18 at 19:04
  • 2
    Answering my own question, looking at [the source](https://sourcegraph.com/github.com/logstash-plugins/logstash-input-kafka@master/-/blob/lib/logstash/inputs/kafka.rb#L249), it looks like each thread will read from both topics – Jim Hoagland Aug 02 '18 at 19:55
  • The suggested config seems doesn't work and Logstash can not understand the conditional statements ,I have defined tags inside inputs and change the conditional statements and it works now. I have also added my config script as an answer. – Lunatic May 08 '21 at 05:23
1

The previous answer didn't work for me and it seems it doses not recognize conditional statements in output, Here is my answer which correct and valid at least for my case where I have defined tags in input for both Kafka consumers and documents (in my case they are logs) are ingested into separate indexes related to their consumer topics .

input {
  kafka {
        group_id => "35834"
        topics => ["First-Topic"]
        bootstrap_servers => "localhost:9092"
        codec => json
        tags => ["First-Topic"]
  }
    kafka {
        group_id => "35834"
        topics => ["Second-Topic"]
        bootstrap_servers => "localhost:9092"
        codec => json
        tags => ["Second-Topic"]
    }
}

filter {

}
output {
    if "Second-Topic" in [tags]{
     elasticsearch {
         hosts => ["localhost:9200"]
         document_type => "_doc"
         index => "logger"
     }
      stdout { codec => rubydebug
           }
   }
   else if "First-Topic" in [tags]{
    elasticsearch {
          hosts => ["localhost:9200"]
          document_type => "_doc"
          index => "saga"
      }
      stdout { codec => rubydebug
      }
    }
}
Lunatic
  • 1,519
  • 8
  • 24
0

Probably this is what you need:

input {
  kafka {
    client_id => "logstash_server"
    topics => ["First-Topic", "Second-Topic"]
    codec => "json"
    decorate_events = true
    bootstrap_servers => "localhost:9092"
  }
}

filter { }

output {
  if [@metadata][kafka][topic] == "First-Topic" {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "logger"
    }
  }
  else if [@metadata][kafka][topic] == "Second-Topic" {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "saga"
    }
  }
  else {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "catchall"
    }
  }
}

There's no need on having two separate inputs of Kafka if they point to the same Bootstrap, you just have to specify the list of topics you want to read from Logstash.

You could also add the "stdout { codec => rubydebug }" if you want to, but that's usually used when debugging, in a prod environment that would cause a lot of noise. 'document_type => "_doc"' can also be used if you want but is not a must, and in the new version of Elasticsearch (8.0) that option is already deprecated, I would simply get rid of it.

And I also added a final "else" statement to the output, if for some reason any of the statements match, it's also important to send the events to any other default index, in this case "catchall".

Dharman
  • 30,962
  • 25
  • 85
  • 135
kevn_gonz
  • 41
  • 5