2

[Hi, I asked this question in the morning but could not get any response or feedback from the community..I would like to bring it up to really get some help from community around..]

I am working on an web-application using Java and backed by Cassandra-a NoSQL database.

Cassandra allows for highly concurrent database operations and in batches of huge no of operations, in a single go. They recommend using batch sizes like hundreds of operations in a single batch operation. But I am unable to figure out, how I can efficiently merge database operations required/submitted by hundreds of concurrent users on my application. What is the way of merging these operations in batches ?

EDIT : I know how to submit batch query to the database, but what I am seeking is how do I collect the queries that are requested/submitted by several concurrent user sessions in a single batch ?

Rajat Gupta
  • 25,853
  • 63
  • 179
  • 294

2 Answers2

2

You're prematurely optimizing. Almost nobody using Cassandra has performance problems with inserts, and of those who do I can't think of any whose problem was overhead from small batch sizes instead of things like memtable thresholds and compaction.

The stress.py and stress.java Cassandra benchmarks use batch sizes of a single row.

jbellis
  • 19,347
  • 2
  • 38
  • 47
  • 1
    Thanks Jonathan! I just wanted to ensure that I am doing inserts in batch of sizes not more than 15-25 operations(in some case much more) per query but in the Hector mailing list I saw some people suggesting out to write to db in batch sizes starting around 500 columns per query and wanted to know if any possible way I could merge the database operations that are required to support multiple user sessions on my application. I just want to ensure that I am taking full advantage of the facility cassandra provides and is fully capable of supporting! – Rajat Gupta Feb 18 '11 at 18:56
1

Disclaimer: I haven't done anything hands-on in Java with a Cassandra style database.

But I was really curious about how caching is handled for these, so I googled it a bit and found:

ehCache

Gora

Kundera (google code, github)

So there are people working to build JPA for nosql db's. This makes a lot of sense to me since an object graph doesn't really translate well to rdbms. See: a stackoverflow question comparing cassandra/rdmbs and Ted Neward on Object-Relational Impedence Mismatch

My point is that there are people out there researching and trying to solve the kind of problems you are asking about. It seems this stuff is all bleeding edge though. Have fun and don't cut yourself!

Community
  • 1
  • 1
Jim
  • 3,476
  • 4
  • 23
  • 33