0

Using Logstash I would like to know how to send data to ES without getting duplications. Meaning that I want to send data that is not present in the ES instance yet, and not data that is already in the instance.

Today I am deleting all the data on the specific index in ES, and then resend all data that is in the database. This prevents duplications but is however not so ideal since I have to manually delete the data.

This is the .config I am currently using:

input {
    jdbc {
        jdbc_driver_library => "/Users/Carl/Progs/logstash-6.3.0/mysql-connector-java/mysql-connector-java-5.1.46-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://*****"
        jdbc_user => "****"
        jdbc_password => "*****"
        schedule => "0 * * * *"
        statement => "SELECT * FROM carl.customer"
    }
}
filter {
    mutate {convert => { "long" => "float"} }
}
output {
    #stdout { codec => json_lines }
    elasticsearch {
        hosts => "localhost"
        index => "customers"
    }
}
  • This answer should help: https://stackoverflow.com/questions/40364951/how-should-i-use-sql-last-value-in-logstash/40365180#40365180 – Val Aug 06 '18 at 15:49
  • Do you have in your table a column for time when the record was created or updated? – Michael Dz Aug 07 '18 at 09:32
  • The database which I am sending the data from gets updated by clearing the whole database then load it with updated data. This is because I´m using an ETL tool. So the data does not have an updated column only created, but the records can have gotten updated since they where created. I am thinking about finding a way of using Logstash to automatically deleting the index. – pumpmancarl Aug 07 '18 at 13:14

0 Answers0