I have tried using batching and other concurrent parameters with dsbulk but couldn't see any improvement. I have tried with datastax Cluster and Session api to create a session and used that session to execute batch statements.
cluster = cluster.builder().addContactPoints("0.0.0.0", "0.0.0.0")
.withCredentials("userName","pwd")
.withSSL()
.build();
session = cluster.connect("keySpace");
BatchStatement batchStatement = new BatchStatement();
batchStatement.add(new SimpleStatement("String query with JSON Data"));
session.execute(batchStatement);
I have used ExecutorService with 10 threads and each thread inserting 1000 queries per batch.
I have tried with something like above and it worked fine for my use case. I was able to insert 2 million records in 15 mins. I am creating insert queries using JSON keyword and creating json from the resultSet. We can also use executeAsync in which case you application thread will finish in a minute or two but cassnadra cluster still took the same 15 mins to add all the records.
To read data from source sybase DB, I have used jdbcTemplate.queryForList which will list records as List> and each object in that list is map which can be converted to JSNO using JSON ObjectMapper writeValueAsString method.
Hope this will be useful to someone.