1

I'm trying to keep a MongoDB collection in sync with an Elasticsearch index using Logstash.

I'm using the Logstash JDBC plugin with the DBSchema JDBC Driver Library for this.

This is the configuration file I'm using for logstash:-

input {
  jdbc{
    jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
    jdbc_driver_library => "/path/to/mongojdbc1.8.jar"
    jdbc_user => ""
    jdbc_password => ""
    jdbc_connection_string => "jdbc:mongodb://127.0.0.1:27017/db1"
    statement => "db.collection1.find({ }, { '_id': false })"
  }
}

output {
  elasticsearch {
    hosts => ["http://127.0.0.1:9200"]
    index => "testing"
    user => ""
    password => ""
  }
}

This works alright, but when I run logstash multiple times, the records are inserted multiple times into Elasticsearch. I do not want records to be re-written. Also, if I modify a document and run logstash again, it should change the same record in Elasticsearch without creating a new document. How do I go about achieving this?

  • 1
    It will only update the same document in elasticsearch if you use the option `document_id` in your logstash output with some unique identifier from your source, if you do not use it, elasticsearch will generate an unique id for every document, do you have a unique identifier field in your source? – leandrojmp May 27 '20 at 12:56
  • 1
    try adding `record_last_run => true` and `last_run_metadata_path => "/usr/share/logstash/bin/since"` to your jdbc section in logstash. you can read more about it here: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html – eladyanai May 27 '20 at 13:03
  • @leandrojmp got it, I need to set a unique identifier for each document first – Krithik Vaidya May 27 '20 at 16:07
  • 1
    @eladyanai that's another good solution, thanks – Krithik Vaidya May 27 '20 at 16:07

1 Answers1

2

You can sync up your documents by ids. Here is the link to Logstash Elasticsearch output plugin, on part with options you need.

So regarding docs you must set doc_as_upsert to true, and pass document_id in output.

output {
  elasticsearch {
    hosts => ["http://127.0.0.1:9200"]
    index => "testing"
    user => ""
    password => "",
    doc_as_upsert => true,
    document_id => "%{id}"
  }
}

Note the document_id => "%{id}" - id here is you doc id field name.

Alex Baidan
  • 1,065
  • 7
  • 15