5

I am new to apache-cassandra and i am planning to use it as the data repository of a new project for its write performance. I have setup a cassandra cluster with three nodes and replication factor 3. My program A uses datastax's cassandra-driver-core 2.1.7 to write and read data from cassandra. Each execution of the program writes about 50 records into cassandra using batch statement. Test of the single execution shows no problem at all. However, when i start to run A in a more intensive way, problem occurs.

Details are as follows: An other program B calls A 40 times within 10 seconds, so there should be 2k records in cassandra after B finishes executing. However, the number of records written to cassandra was only 25-30% (varies randomly in each run of B) of the 2k records. I was using cqlsh to check the number of records written, by the way. I need to re-run B several times so that eventually all 2k records can be written into cassandra.

I have totally no clue now, there was no error reported in the execution of both A and B, and from log, A did got executed 40 times.

I don't know if this is connected to cluster set up, consistency level setting,etc, or if there's any tuning i need to do to take care of higher frequency writing.

The code is something like :

String query = "insert into A (a,b,c,d,e,f) values (?,?,?,?,?,?)";
PreparedStatement p = session.prepare(query);
BatchStatement b = new BatchStatement();
for (int i=0; i<50; i++) {
  BoundStatement b1 = p.bind();
  b1.setInt("a",A);
  ...
  b1.setInt("f",F);
  b.add(b1);
}
session.execute(b);

Any help would be greatly appreciated!

Addition:

I changed my code not to use batch statement as @aaron and others suggested. The problem still remains, not all records were written into cassandra (i mean i cannot see them using cqlsh's select statement). After a while, i noticed that problem only occured to those records that have previously been inserted (removed before being inserted again using delete cqlsh statement). If the records have never been inserted before , correct results were shown using cqlsh's "select * from ". Can anyone enlighten me why this is so and if there's a way to avoid this from happening ? Thanks a lot.

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
firew
  • 101
  • 6
  • 1
    Could you elaborate on how you execute statements? (a code snippet would be helpful) One thing that comes to mind is if you use the async API and never check the returned futures, you might have requests that fail silently. – Olivier Michallat Aug 19 '15 at 09:33
  • @OlivierMichallat I am using session.execute(). I suppose it's synchronous? i will update my post with code – firew Aug 19 '15 at 09:40
  • The best use for batch statements, is to push upserts to multiple tables atomically. It really was not meant for sending 50 updates to the same table. See how your code works *without* using a batch statement. – Aaron Aug 19 '15 at 12:57
  • @BryceAtNetwork23 Thanks a lot! i will try it – firew Aug 20 '15 at 01:12
  • Cassandra is best for write throughput, for just 2k records you really don't need to batch it and for sure this scenario does not quality for batching the query. Instead have you program B call A in multiple threads and reduce the consistency to one. To be on safe you could run a read repair on the nodes after you program completes(although not mandatory) – Aravind Chamakura Aug 20 '15 at 03:04
  • Definitely there seems to be issue with insertion. Posting your code is helpful to solve the problem quickly. – Ravindra babu Aug 20 '15 at 07:00

0 Answers0