2

I have a table of 20GB data having 50 million rows. Need to migrate to ElasticSearch using logstash jdbc input plugin. I have tried all basic implementation but need help in migrating data in batch i.e only 10,000 rows at a time. I am not sure how and where to specify this count and how to update it the next time i run logstash. Please help me solve this issue

This is what i have:

input {
    jdbc {
        jdbc_driver_library => "mysql-connector-java-5.1.12-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://localhost/db"
        jdbc_validate_connection => true
        jdbc_user => "root"
        jdbc_password => "root"
        clean_run => true
        record_last_run => true
        use_column_value => true
        jdbc_paging_enabled => true
        jdbc_page_size => 5
        tracking_column => id
        statement => "select * from employee"
    }
}

Thanks in advance.

Andy
  • 3,794
  • 24
  • 28
Chitra
  • 41
  • 1
  • 4
  • 1
    Does this help? https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_dealing_with_large_result_sets – Val May 13 '16 at 14:45
  • Thank you Val, can you please give me some example how to implement it. – Chitra May 16 '16 at 05:02
  • There's an excellent blog post that provides an example from A to Z: https://www.elastic.co/blog/logstash-jdbc-input-plugin – Val May 16 '16 at 05:03
  • 1
    Yes, i went through this blog, but it does not satisfy my requirement. I can limit the query to fetch 10,000 records at a time but how will it know the next time to fetch 10,000 records starting from 10,001. So this is some parameter which needs to be passed to the plugin but not sure how to pass this parameter externally. ex : statement => "select * from employee where id > :sql_last_value limit 10" takes only 10 records the first time and stores id of 10th row in the last run record. But next time when i run it, it again fetches the same rows but not the 11-20th rows – Chitra May 16 '16 at 06:01
  • 1
    You can set [`jdbc_paging_enabled`](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-jdbc_paging_enabled) to `true` in order to achieve what you need. – Val May 16 '16 at 11:06
  • Val, This is what i have input { jdbc { jdbc_driver_library => "mysql-connector-java-5.1.12-bin.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_connection_string => "jdbc:mysql://localhost/db" jdbc_validate_connection => true jdbc_user => "root" jdbc_password => "root" clean_run => true record_last_run => true use_column_value => true jdbc_paging_enabled => true jdbc_page_size => 5 tracking_column => id statement => "select * from employee" } } can you point out where i might have gone wrong. This is great help from you, thanks – Chitra May 16 '16 at 11:26
  • Please update your question instead of adding code to the comments. It's more legible. – Val May 16 '16 at 11:30

1 Answers1

5

You need to set jdbc_paging_enabled to true in order for pagniation to work.

But you also need to make sure that clean_run is set to false, otherwise pagination won't work.

Val
  • 207,596
  • 13
  • 358
  • 360
  • thank you so much for your guidance, i could achieve what i wanted. – Chitra May 17 '16 at 06:27
  • @Val How does pagination behave if aggregate is used ?, that is, if it just cuts a page with a certain id and on the next page some identical id remained, the aggregate will clean and the data will be lost or added to the existing one? – Max Jul 17 '20 at 14:56
  • 1
    @Max please create a new question with your exact needs – Val Jul 17 '20 at 14:56
  • @Val Here's a similar unanswered question, so I didn't want to duplicate the question. :) https://stackoverflow.com/questions/61032373/logstash-aggregate-filter-use-with-pagination#comment111331511_61032373 – Max Jul 17 '20 at 15:04