I'm trying to add some data from an SQL query to my Elasticsearch, and it's currently running as a Logstash jdbc input like this:
jdbc {
type => "database_log"
jdbc_connection_string => "jdbc:sqlserver://(redacted)"
jdbc_user => (redacted)
jdbc_password => (redacted)
jdbc_driver_library => "D:\ELK_56\sqljdbc4.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
schedule => "0 * * * *"
statement => "select user_id,First_Name from user_login"
Now the problem is that like each hour when the input runs, it just adds the entire result of the query, some 124k rows. My intent is to just add the new rows (of which there are about 3-4 per hour) that have appeared since the last time i queried, and avoid duplicates.
I can't just solve it by adding a WHERE clause to the query since this particular table has no date column or any sort of such identifier.
Is there any way to have logstash or elasticsearch itself substract the previous result from the latest one and just add the new rows?
Thank you for your reply.