Cassandra multiple writes distribution

Question

I have a 3 nodes Cassandra (2.0.3) cluster installed here's my table:

CREATE TABLE user (
    id text,
    phone text,
    name text,
    email text,
    PRIMARY KEY (phone, id)
);

I use the datastax java driver

here's my user object creation:

User user = new User();
user.setId(UUIDs.timeBased().toString());
user.setEmail(null);
user.setName("test-user");
user.setPhone(Credential.MD5.digest("user-" + i));

I create 10k of these - i is the index of the user in my users array. I do not want to use a batch insert, but rather simulate a stress of inserting multiple records. Here's my code:

Cluster cluster = Cluster.builder()
            .addContactPoints(CASSANDRA_CLUSTER_ADDRESSES)
            .build();
final Session session = cluster.connect(keyspaceName);
final ThreadPoolExecutor tpe = (ThreadPoolExecutor) Executors.newCachedThreadPool();
for (final User user : users) {
    tpe.execute(new Runnable() {
        @Override
        public void run() {
            PreparedStatement ps = 
                session.prepare("INSERT INTO user (id, phone, name, email) VALUES (?, ?, ?, ?)");
            BoundStatement bs = new BoundStatement(ps);
            bs.bind(
                    user.getId(),
                    user.getPhone(),
                    user.getName(),
                    user.getEmail(),
            );

            session.executeAsync(bs);
        }
    });
}

tpe.shutdown();
tpe.awaitTermination...

when counting the number of records (using cqlsh) I never get beyond a 4k (out of 10k)
only a single server is doing the writes (using opscenter write-request/all-nodes graph) - I can't see the reason: keys are random enough as far as I can tell...

Can someone point me anywhere?

score 1 · Answer 1 · answered Dec 30 '13 at 22:02

when counting the number of records (using cqlsh) I never get beyond a 4k (out of 10k)

You are using an unbound thread pool, this means all writes are nearly executed at the same time. Probably you reached a performance limit and Cassandra answers with write timeouts. Try to reduce the number of concurrent writes and check the result of the execution. E.g.

final ThreadPoolExecutor tpe = (ThreadPoolExecutor) Executors.newFixedThreadPool(20);    
...
ResultSetFuture future = session.executeAsync(bs);
try {
    future.getUninterruptibly();
} catch (Exception e) {
    e.printStackTrace();
}

only a single server is doing the writes (using opscenter write-request/all-nodes graph) - I can't see the reason: keys are random enough as far as I can tell...

The primary is defined as PRIMARY KEY (phone, id). This means phone is the partition key, and id is only the clustering key. But if the phone values are really different MD5 digests, this should spread across all nodes.

Martin, regarding the second part of your answer, I know it should spread correctly. I know that the tokens are good from the opscenter ring view and still only one node is doing the writes. — Aviram, Jan 01 '14 at 14:28

Cassandra multiple writes distribution

1 Answers1