Cassandra bach query consistency

Question

I am using Cassandra Java driver.

I have a use case in which I do batch insert data into various Cassandra tables having different partition key.

    BatchStatement batch = new BatchStatement();
    batch.add (query1, query2, .....)
    session.executeAsync(batch);

Consider I have 20 queries in my batch statement, 15 queries are executed well and 5 queries failed.

How can I know which queries failed and which got executed.

I am using executeAsync because of performance perspective.

<Edit1:>

We are using 'unlogged batch query'.

Always try to keep batch size small. http://stackoverflow.com/questions/34699841/what-is-the-batch-limit-in-cassandra — Ashraful Islam, Mar 20 '17 at 09:06

Sergey Kuptsov · Answer 1 · 2017-03-20T10:56:58.613

2

Logged multi partition batches are atomic, but at cost of performance. From official Cassandra documentation:

Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will.

So any query will succeed or no one.

Unlogged multi partition batches are not atomic. Better run each query async and then collect results individually, or collect them by same partition key and run them in logged batch.

For example:

    List<String> queries = new ArrayList<>();

    List<ResultSetFuture> results = queries.stream()
            .map(query -> session.executeAsync(query))
            .collect(Collectors.toList());

    results.stream()
            .map(result -> {
                try {
                    return Optional.ofNullable(result.getUninterruptibly());
                } catch (Exception ex) {
                    // do smth
                    return Optional.empty();
                }
            })
            .forEach(//do Something);

Also turn on java client connection pooling, so there is no new connection establishment on each query http://docs.datastax.com/en/developer/java-driver/2.1/manual/pooling/

edited Mar 20 '17 at 10:56

answered Mar 20 '17 at 09:07

Sergey Kuptsov

149
4

I think Batches are atomic if they are of 'one partition' key – Prakash P Mar 20 '17 at 09:31
But in my use case Bach query contain queries belonging to different partision – Prakash P Mar 20 '17 at 09:31
So, this is a common misuse of batch statement - look https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.x8t4t329p – Sergey Kuptsov Mar 20 '17 at 09:40
Logged Batches are atomic among multiple partition keys but this is expensive. – Sergey Kuptsov Mar 20 '17 at 09:43
We are using `Unlogged batch` – Prakash P Mar 20 '17 at 09:47
Ok, so they are not atomic - why don't run each query asynchronously and then collect result of each individually. In unlogged multi partitions batch there will be no performance, - look here https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/ – Sergey Kuptsov Mar 20 '17 at 10:02
Many thanks, but I have 150k queries / second, won't firing these much queries per second create too much connections..? – Prakash P Mar 20 '17 at 10:08
At first you can group some statements by primary key and use logged batch, secondly turn on java client connection pooling, so there is no new connection establishment on each query http://docs.datastax.com/en/developer/java-driver/2.1/manual/pooling/ – Sergey Kuptsov Mar 20 '17 at 10:14
could you please provide an example how I can track the status of each aysnc query – Prakash P Mar 20 '17 at 10:41

Cassandra bach query consistency

1 Answers1