0

I have newly subscribed to elastic platinum license to use elastic cloud services. I want to design the ingestion pipelines, which would pull data from snowflake data warehouse into the elasticsearch service. I have explored and found following methods, please suggest the best approach to design the pipeline. Also point me to any documentation available for it.

  1. Setup a logstash server, install logstash service in it. Define a pipeline which would pull data from snowflake into elastic service, using jdbc snowflake plugins.
  2. Use logstash pipelines, in elastic cloud. Trigger the pipeline through a rest api when ingesting. This cloud pipeline setup does not seem to have much documentation on how to implement.
  3. Ingesting through rest apis. I think this method is not the best, as configuring pipelines is not very flexible.
  4. Any other method such as Beats.
neo
  • 425
  • 3
  • 11

1 Answers1

0

I would go with #1:

You can use Logstash to pull the data from Snowflake via JDBC driver and then send the data to Elastic Cloud:

Some reference links:

Snowflake JDBC driver https://docs.snowflake.com/en/user-guide/jdbc.html

Logstash to Elastic cloud using cloud_id: https://www.elastic.co/guide/en/logstash/current/connecting-to-cloud.html

Example Logstash conf file

input {
    jdbc {
        jdbc_driver_library => "/etc/logstash/pipeline/snowflake/dependency/snowflake-jdbc-3.9.2.jar"
        jdbc_driver_class => "com.snowflake.client.jdbc.SnowflakeDriver"
        jdbc_connection_string => "jdbc:snowflake://url.eu-west-1.snowflakecomputing.com/?db=dbname&warehouse=House"
        jdbc_user => "user"
        jdbc_password => "password"
        schedule => "15 * * * *"
        statement => "
            SELECT TOP 100000 * FROM dbname.CORE.table WHERE timestamp >= '2038-10-02 00:00:00' AND timestamp < '2038-10-02 23:59:59'
        "
        jdbc_fetch_size => 1000
    }
}

output {
    elasticsearch {
        cloud_id => "<cloud id>" 
        api_key => "<api key>"
    }

}

To keep both databases on sync you can use the sql_last_value parameter, more info:

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_state

llermaly
  • 2,331
  • 2
  • 16
  • 29