0

Cassandra data modeling respects "Denormalization and duplication of data is a fact of life with Cassandra". But one of the cons for demormalized data is making the updates very hard. For example, if I have three tables catering for different queries, selecting is fine. However, if in my app, I want to update a username and I need to update these three tables? The update on first table looks ok. How about the latter two? The upates are going to be very expensive? How should I handle this case?

CREATE TABLE users_by_username (
    username text PRIMARY KEY,
    email text,
    age int
)

CREATE TABLE users_by_email (
    email text PRIMARY KEY,
    username text,
    age int
)

CREATE TABLE groups (
    groupname text,
    username text,
    email text,
    age int,
    hash_prefix int,
    PRIMARY KEY ((groupname, hash_prefix), username)
)
Hammer
  • 8,538
  • 12
  • 44
  • 75
  • Yes, you have to update them separately. It's not expensive in terms of compute, though, it should be quite fast. At least, that's what I've seen in practice. – Don Branson May 16 '16 at 16:06
  • But the latter updates arr equivalent to search plus change right? Should not it be slow, as u are actually working on a non-primary key? – Hammer May 17 '16 at 00:19
  • 1
    Have you run your updates and timed them? That's the way to know for sure. – Don Branson May 17 '16 at 01:02
  • 1
    Have you looked into using materialized views for simple denormalizations like that? – Chris Lohfink May 17 '16 at 21:23
  • @Don Branson, yes, will give it a shot. Just want to understand how cassandra handles updates. – Hammer May 18 '16 at 00:56
  • @Chris, tks. Yes materialized view is a good idea. But i am just wondering how updates are handled in cassandra, not necessarily for cassandra3 only – Hammer May 18 '16 at 00:58
  • Or is it because there is no real update, but a write is behind the scene, which is what cassandra is gd at? – Hammer May 18 '16 at 01:06
  • not really an answer but there is free material at https://academy.datastax.com/ that will walk you through how updates work and data modeling (its a bit much for a SO question). Theres a bunch of stuff on youtube as well. – Chris Lohfink May 18 '16 at 03:30

2 Answers2

1

This is a typical problem I see when people try to put relational model in Cassandra which is being updated through time. Cassandra is a great database and for what it does, it works wonders. There are many features that enable all kinds of different data models and you can cover almost all use cases. When you look at your use case the question is why would you use Cassandra for relational model? If you really want to make Cassandra cover your use case you will have to do a lot of different operations on application level just to execute updates and keep your data in consistent state.

Matija Gobec
  • 850
  • 6
  • 12
  • These examples are from cassandra doc. Could u be more specific on the question raised? – Hammer May 18 '16 at 03:44
  • As per question, you can have generated uuid for users placed in all related tables so that you can do updates on users table without having to go through complicated updates in code. You can even write application code that updates all user information in the tables you posted but that tends to get complicated. If I'm forced to have such a model I prefer to do application level joins and have one user table. – Matija Gobec May 19 '16 at 09:51
  • using uuid in all table will create another query needed to get the actual user name right? – Hammer May 19 '16 at 11:47
  • Yes it will but thats the price to pay – Matija Gobec May 20 '16 at 09:02
  • As writes are cheap and fast. Is it worth to adding another query ? – Hammer May 20 '16 at 10:14
  • Its worth it. Be careful when doing this as between reads your data will be in inconsistent state. If this is a deal breaker for you think about doing batch write if you update user entity across multiple tables. – Matija Gobec May 20 '16 at 11:09
0

After watching a few youtube clips, it looks like Canssandra's update is a simple write to append a record to the commit log in the file system. Then the data is put to memtable in cassandra server and send acknowledge to the client straight away. So the update call finishes. This makes the updating fast to the clients.

The whole compaction process happens afterwards, including flushing, sequential writing and merging based on the timestamp.

Hammer
  • 8,538
  • 12
  • 44
  • 75
  • If your question is "how fast the update is" the answer is "as fast as insert" but if your question is "how hard will it be to maintain this model (from first question)" then the answer is "its not going to be straightforward". – Matija Gobec May 20 '16 at 09:04