Cassandra Schema

Question

I've currently designed a schema in Cassandra but I'm wondering if there's a better way to do things. Basically, the issue is that most, if not all of the reads are dynamic. I've constructed a segmentation system as an application service that reads a dynamic custom query (completely unrelated to Cassandra, but the query is strict and limited to the application) and it goes ahead and queries cassandra and merges the results.

I've made most of the column families as wide as I thought would be good, and because the data is extremely write intensive, used composite keys to partition the load.

This is basically implementing a query layer on-top of Cassandra that's application specific, including having some sort of join or merge operation.

Are there any limitations to this layout or process?

I think you will have to be more specific. What do you mean by "dynamic reads"? Why segment then merge? — Raedwald, Jul 03 '13 at 07:27
@Raedwald Basically, the application layer exposes a query service A. It's task is to serve as a data segmentation. The segmentation is completely unrelated to cassandra (although, the data is stored in it). Instead of touching Cassandra directly, we're exposing a layer of indirection to provide a much more powerful segmentation. For example, someone could request the dataset of "find all users where they're from canada and browser is Firefox and they searched 5 times on the homepage after login in". That's the kind of data segmentation I'm talking about. — Daniel, Jul 03 '13 at 07:39
The service layer is responsible to perform this segmentation. I'm trying to result in extremely large rows (column wise) to allow quick reads but given how some data are counters and others are other-types, the data needs to be split across two column families automatically. — Daniel, Jul 03 '13 at 07:40
Although the query is neither that clear or in english fyi. Just an example — Daniel, Jul 03 '13 at 07:46

score 1 · Answer 1 · answered Jul 03 '13 at 10:54

1

If you are trying to do some kind of OLAP using Cassandra as the back-end, I think you will have problems. The advice I've seen on designing Cassandra tables is to start with the queries you expect to run, then design denormalised tables that make your queries fast. So you need to know what the queries are; it sounds like that is not the case for your application. Perhaps a RDBMS would be better?

answered Jul 03 '13 at 10:54

Raedwald

46,613
43
151
237

RDBMS wouldn't work because the database is mostly handling writes. I'd say 90% is writes and 10% are reads, and the throughput of the writes are fairly high. Cassandra offers really good write performance and no single point failure, which is key. – Daniel Jul 03 '13 at 11:23

score 1 · Answer 2 · answered Jul 03 '13 at 12:30

One option is PlayOrm for cassandra (really a object nosql mapping not relational as it follows many nosql patterns). It does have it's own S-SQL language that does joins of partitions. It is not going to join your billion row table with billion rows though, but if your partitions are say under a million rows, it can help you out there.

nosql once in a while has client side joins depending on context and PlayOrm makes it so you don't have to do that much work when you do need a join in nosql which can be pretty rare though.....many times denormalization is better.

The patterns in playorm are also different than hibernate like a one to many, the FK's for the many are embedded in the row as this is how you do it in nosql.

later, Dean

Cassandra Schema

2 Answers2