I am using Spring Cloud Data Flow to create a custom stream to load data into Snowflake. I have written a custom sink to load data into Snowflake using Snowflake's JDBC driver. The methodology that I used is similar to any database update using the following steps:
- Create a connection pool (used HikariCP) to obtain Snowflake database connection.
- Using prepared statement, created a batch of rows to commit all at once.
- Using a scheduled timer committed the batch to snowflake.
This is when I noticed that the batch is being updated very slowly in Snowflake - i.e. one or two records at a time and a batch of 8K rows took well over 45 minutes to update in Snowflake table (using a XS warehouse).
My question: Is there a better/another/recommended method to stream data into Snowflake? I am aware of Kafka connector to Snowflake and Snowpipes (which use an internal/external stage) but these are not the options we would like to pursue.
PreparedStatement preparedStatement = null;
Connection conn = null;
String compiledQuery = "INSERT INTO " + env.getProperty("snowtable") + " SELECT parse_json (column1) FROM VALUES (?)";
conn = DataSource.getConnection();
preparedStatement = conn.prepareStatement(compiledQuery);
for(int i = 0; i<messageslocal.size(); i++) {
preparedStatement.setString(1, messageslocal.get(i));
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
Thank you!