0

I am facing some peculiar problem in DSE 3.2.4, here is my table structure,

CREATE TABLE tbl_samp (
  PK text,
  CK1 varint,
  CK2 text,
  CK3 varint,
  value float,
  PRIMARY KEY (PK, CK1, CK2, CK3)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

I am dumping huge amount of data from pig to cassandra using CqlStorage();

I have around 1.12 million distinct combinations of (PK, CK1, CK2, CK3)

so when I finished running PIG

here is my Pig Relation

reqDataCQL = foreach reqData generate TOTUPLE(TOTUPLE('PK',PK), TOTUPLE('CK1',CK1), TOTUPLE('Ck2',CK2), TOTUPLE('CK3',Ck3)), TOTUPLE(value);

store reqDataCQL into 'cql://MyKeyspace/tbl_samp?output_query=update+MyKeyspace.tbl_samp+set+value+%3D+%3F' using CqlStorage();

I can see following

Input(s):
Successfully read 34327 records from: "/user/k/Input.txt"
Successfully read 4 records from: "cql://MyKeySpace/mappingtable"

Output(s):
Successfully stored 1128902 records in: "cql://MyKeySpace/tbl_samp?output_query=update+conflux.to1+set+value+%3D+%3F"

But when I Query the table tbl_samp I can see only 8600 records which are combination of (PK and CK1)

here is my count query

    select count(1) from tbl_samp limit 2000000;

 count
-------
  8681

Is there any gap in my understanding of Composite Key?

I know PK is my RowKey and (CK1,CK2,CK3) combinations with Value will be my column name

My understanding in Cassandra Composite is

PK,(CK1|CK2|CK3|value:1),(CK11|CK22|CK33|value:11)
PK1,(CK111|CK222|CK333|value:111)

please help me on this

sudheer
  • 338
  • 1
  • 6
  • 17

2 Answers2

0

For your primary key PK, CK1, CK2, CK3:

The partition key is PK. It decides which partition the row goes into. Inside a parition, each unique combination of CK1, CK2 and CK3 define the column. So, all keys in the primary key make up a unique referemce. If you insert multiple entries with the same PK, CK1, CK2 and CK3, the last write wins.

What's your CQL query? What's the replication factor of the keyspace? What consistency level are you specifying for the reads and write? It could be that your read and write consistenty (RC and WC) are low, so you're reading from replicas that haven't been written to.

ashic
  • 6,367
  • 5
  • 33
  • 54
  • I have 1.12 Million DISTINCT combinations of PK,CK1,CK2 and CK3 but when I store using CqlStorage() it is dumping only 8k which are DISTICNT combinations of PK and CK1 – sudheer Jul 29 '14 at 09:20
  • What's the replication factor on the table? Is there a way in pig to specify the consistency level cassandra will use? Try setting the replication factor of the keyspace to 3, using a write consistency of QUORUM and add using CONSISTENCY QUORUM in your select query. If the replication factor is more than one, and pig is writing with one, then try using consistency all in your select query. You wouldn't use that in production, but see if that gives you the expected count. – ashic Jul 29 '14 at 10:12
0

Sorry its my fault my understanding of Composite Key was correct. I have one UDF where I am overwriting this combination of (PK,CK1,CK2,CK3)

Soo in general cassandra stores based on Partition Key and combination of partition key and clustering columns gives each row.

and column names will be unique combination of Clustering columns.

PK,(CK1|CK2|CK3|value:1),(CK11|CK22|CK33|value:11)
PK1,(CK111|CK222|CK333|value:111)

thanks

sudheer
  • 338
  • 1
  • 6
  • 17