2

Imagine a E-commerce application:

Let's say I have three Node cluster N1, N2, N3. and my consistency level (CL) is weak: That is

Read CL = N/2+1 = 2 (in this case), Write CL = Any (alteast 1)

I have a product table such as

This is the initial data that is in sync across three Nodes

 product_info : { 'computer': 1}
  1. Now Client A reads the info from N1 and Client B reads the info from N2

    Client 1 sees 1 computer is available

    client 2 sees 1 computer is available

  2. Both of them now go for buying Client A places the order first. so N1, the table will look like the following:

    product_info : {'computer':0}

  3. and now client 2 makes the order so at N2, the table will look like the following:

    product_info : {'computer':0}

    But in reality client 2's order should not have been processed.

  4. client C access through N3. Now a read is done at N1 which returns 0. (since quorum at least 2 nodes should respond) N3 has value of 1 but its time-stamp is outdated. so It will update its value and shows to client that no computers are available. This is good

    In this example, both weak and strong consistency level will lead to wrong results, simply because at the time when the first product_info is loaded by client A and B, the data is in sync. How can this be handled in Cassandra?

brain storm
  • 30,124
  • 69
  • 225
  • 393

1 Answers1

3

You haven't mentioned your replication factor.

If your read consistency + write consistency > replication factor, you WILL get immediate consistency.

Let's say your replication factor is 3. For immediate consistency and RC = 2, you will need WC at least 2. If you want immediate consistency and WC = 1, your RC will need to be 3. Note, this would seriously impact availability as one node going down would mean you can't read.

Immediate consistency means that you will read whatever's been written. i.e. after a successful write, no read will be reading the previous value. However, this does NOT prevent your application using a value it has previously read.

You can use lightweight transactions in this case. Update ..... IF [some condition.]. This will perform slower but may be enough for your use case.

Quite often, specially in distributed scenarios, it is better to deal with failure - even make it a business case - instead of trying to prevent anything "bad" from ever happening. Edge cases like this are opportunities to talk with the business, and find hidden opportunities:

  • What happens if we overbook an item?
  • Is it better to cancel an order, or let the customer know that their order has been inevitably delayed, possibly making the sale and giving them a gift voucher.
  • Can we give the customer a slightly better computer taking a slight hit on profit? This can help us make the sale, satisfy the customer and possibly give us return business. Dell often does this.
  • Can we call up the customer and explain the scenario, potentially upselling?

We can even accept the order and let one customer know when we find that there's an issue - I've personally seen this with Amazon.

If we absolutely must prevent any overselling at sell time, then there are patters for that as well. We can use a distributed lock using something like raft or even zookeeper to handle coordination outside of cassandra. We can also implement logical locks with TTLs for each item - with TTLs to ensure messy code doesn't mess up inventories.

It really depends on how string a guarantee you want, and how much trouble you're willing to go through to achieve this. And more so, if it's not more profitable to not solve it.

Hope that helps.

ashic
  • 6,367
  • 5
  • 33
  • 54
  • what happens with immediate consistency during write: say two out of three nodes are out of network. Is the write success for the client? I mean write is failed at client level, because he is indicated. but one node (of the three) has over-written the value – brain storm Jul 24 '14 at 22:00
  • If WC is 2, then at least 2 replicas must acknowledge the write for the client to receive a success. So, if two replica nodes for a partition are down with a replication factor of three, if WC = 2 is specified, the write will fail. It's similar in case of reads - n=RC nodes must respond for a read to be successful. This is a way of trading availability for consistency. – ashic Jul 24 '14 at 22:09
  • The write is failed as far as client concerned. But one of the node has written the value and there is no rollback feature. so when the failed node comes back, it gets the latest value by gossip. while the query is actually executed, the client sees as if it has failed. – brain storm Jul 24 '14 at 22:23
  • I could be wrong, but it looks like unsuccessful commits aren't applied. Just set up a cluster, and tried out that scenario. Seems that the "unsuccessful" write isn't returned from any node, even with read consistency one. Now this could be due to the coordinator already knowing that the replicas are down and not issuing the write...I'll try to find the exact reason, and if this is always the case. – ashic Jul 25 '14 at 00:10
  • Update: It appears that the reason my demo "worked" is because information about down nodes are propagated through gossip, and the coordinator simply won't send out requests if it knows nodes are down. If requests are sent out, then a node dies, then the scenario you're describing can happen. The recommendation for a failed write on client is to retry. And that's usually ok, as inserts and updates are basically upserts. Be careful will counter columns though. The workaround used will undoubtedly depend on your use case and the level of guarantee needed. – ashic Jul 25 '14 at 01:15