Cassandra update fails silently with several nodes

Question

I have following situation.

There is a CQL table (Cassandra 2.0.12)

CREATE TABLE article (
  version timeuuid,
  id timeuuid,
  active boolean,
  contentbody text,
  contentformat text,
  createdat text,
  entitytype text,
  externalsources list<text>,
  geolat double,
  geolong double,
  lastcomments list<text>,
  lastmodifiedat text,
  lstmodbyuserid text,
  lstmodbyusername text,
  previewimage text,
  publishedatarticle text static,
  publishedatver text,
  status text,
  subcategory text,
  subtitle text,
  title text,
  userid text static,
  username text static,
  PRIMARY KEY ((version), id)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

and work with it with datastax-java-driver (cassandra-driver-core 2.1.1)

When cluster contains 3 nodes data update, like

UPDATE article SET title='updated title2',subtitle=null,status='draft',subCategory='{"id":"a6b68330-2ef5-4267-98c5-cd793edbb1a8","name":"sub cat name","color":"blue","parentCategory":{"id":"prim_cat_id","name":"prim cat name","color":"blue"}}',contentBody='someOtherBody',contentFormat='someOtherFormat',geoLat=138782.34,geoLong=138782.34,lastModifiedAt='2015-03-02 11:14:57',publishedAtArticle=null,publishedAtVer=null,lstModByUserId='e264fb2c-2485-488a-965f-765d139be9ea',lstModByUsername='reg1 user',externalSources=[],previewImage='{"width":1,"height":2,"duration":32,"original":"orig string","thumbs":{"prefix":"prefix str","ext":"jpg","sizes":["size1","size2"]}}' WHERE version=2480d891-c0cd-11e4-a691-df79ef55172c AND id=2480d890-c0cd-11e4-a691-df79ef55172c;

doesn't work in about half of cases silently (no errors I see in Cassandra logs, nothing suspicious in traces, no failure answer or exception, I can see that it didn't succeed just by SELECT). In case of cluster from one node, it always works.

Could you help me with some direction in case investigation?

What do you mean it doesn't work? Are you inserting this record over and over? What exactly are you checking via SELECT? — phact, Mar 02 '15 at 14:41
What is the keyspace replication factor? What is the insert consistency level? What is your read consistency level? — Roman Tumaykin, Mar 02 '15 at 21:52
@phact, yes, I am inserting record to existing key (updating actually with insert syntax) and selecting by primary key. Repeatedly in about half of cases insert doesn't report any problem, but consequent select shows not updated record — Olga Gorun, Mar 03 '15 at 15:32
@RomanTumaykin, keyspace description: `code` CREATE KEYSPACE pheed WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '3' }; I don't think that consistency level is relevant, because I tried different periods between INSERT AND SELECT (from 1s to 2 minutes), this didn't influence results — Olga Gorun, Mar 03 '15 at 15:37
Do you see any dropped mutations using nodetool tpstats? Try to run repair after you notice problem and check if repair fixed it — Roman Tumaykin, Mar 03 '15 at 16:01
@OlgaGorun also please check if the time on all of your nodes is in sync. — Roman Tumaykin, Mar 03 '15 at 16:26
Additional idea, can it be related to nodes time synchronisation problem? — Olga Gorun, Mar 03 '15 at 16:27
@RomanTumaykin, thank you. It seams to be my case. Will check — Olga Gorun, Mar 03 '15 at 16:31
I noticed you are setting a set datatype here `externalSources=[]` this effectively creates a tombstone for this column and is generally not good practice. It's probably not the source of your problem here but also try updating sets according to the documentation and see if this helps: http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_updating_collection_c.html — markc, Feb 09 '16 at 09:15

score 5 · Accepted Answer · answered Mar 03 '15 at 16:42

5

Since you mentioned that your nodes times aren't in sync, you may have a rare, but still possible condition.

If time is not in sync between the nodes, it may lead to some unpredictable results during the updates/inserts.

Usually when the write comes, the one with the latest timestamp wins. If one of your nodes has a time way behind, then when it becomes a coordinator, it stamps all of your records with its timestamps and therefore decides that this update is too old since there are already other updates with newer timestamp (from the nodes with accurate time). And therefore discards the update.

answered Mar 03 '15 at 16:42

Roman Tumaykin

1,921
11
11

1

Use NTP! http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html?scroll=reference_ds_sxl_gf3_2k__synchronize-clocks – phact Mar 06 '15 at 15:08
How much difference in time will make this condition to occur? Will difference in milliseconds cause this issue? – Mar 11 '16 at 10:25
In an edge scenario, when 2 inserts/updates come within milliseconds of each other, then difference in milliseconds can cause this issue. – Roman Tumaykin Mar 15 '16 at 06:39

Cassandra update fails silently with several nodes

1 Answers1

Linked