Primary key in cassandra is unique?

Question

It could be kind of lame but in cassandra has the primary key to be unique? For example in the following table:

CREATE TABLE users (
  name text,
  surname text,
  age int,
  adress text,
  PRIMARY KEY(name, surname)
);

So if is it possible in my database to have 2 persons in my database with the same name and surname but different ages? Which means same primary key..

Daniel · Accepted Answer · 2014-02-13T18:46:20.823

15

Yes the primary key has to be unique. Otherwise there would be no way to know which row to return when you query with a duplicate key.

In your case you can have 2 rows with the same name or with the same surname but not both.

edited Feb 13 '14 at 18:46

answered Feb 13 '14 at 18:39

Daniel

26,899
12
60
88

8

well.. to be fair, it could return both values if there were duplicates :) – Chip May 18 '16 at 19:33

score 10 · Answer 2 · answered Feb 14 '14 at 03:31

By definition, the primary key has to be unique. But that doesn't mean you can't accomplish your goals. You just need to change your approach/terminology.

First of all, if you relax your goal of having the name+surname be a primary key, you can do the following:

CREATE TABLE users ( name text, surname text, age int, address text, PRIMARY KEY((name, surname),age) );
insert into users (name,surname,age,address) values ('name1','surname1',10,'address1');
insert into users (name,surname,age,address) values ('name1','surname1',30,'address2');
select * from users where name='name1' and surname='surname1';

 name  | surname  | age | address
-------+----------+-----+----------
 name1 | surname1 |  10 | address1
 name1 | surname1 |  30 | address2

If, on the other hand, you wanted to ensure that the address is shared as well, then you probably just want to store a collection of ages in the user record. That could be achieved by:

CREATE TABLE users2 ( name text, surname text, age set<int>, address text, PRIMARY KEY(name, surname) );
insert into users2 (name,surname,age,address) values ('name1','surname1',{10,30},'address2');
select * from users2 where name='name1' and surname='surname1';

 name  | surname  | address  | age
-------+----------+----------+----------
 name1 | surname1 | address2 | {10, 30}

So it comes back to what you actually need to accomplish. Hopefully the above examples give you some ideas.

what is difference between `PRIMARY KEY((name, surname),age)` and `PRIMARY KEY(name, surname ,age)` — Prakash P, Feb 07 '17 at 12:51
@PrakashPandey Those are [Composite keys](http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refCompositePk.html). As stated on the provided link: A composite partition key is a partition key consisting of multiple columns. You use an extra set of parentheses to enclose columns that make up the composite partition key. The columns within the primary key definition but outside the nested parentheses are clustering columns. These columns form logical sets inside a partition to facilitate retrieval. — mschuurmans, Jul 31 '17 at 08:32

score 4 · Answer 3 · answered Feb 13 '14 at 18:39

4

The primary key is unique. With your data model, you can only have one age per (name, surname) combination.

answered Feb 13 '14 at 18:39

Richard

11,050
2
46
33

score 0 · Answer 4 · edited Oct 23 '18 at 07:29

Yes as mentioned in above comments you can have a composite key with name, surname, and age to achieve your goal but still, that won't solve the problem. Rather you can consider adding a new column userID and make that as the primary key. So even in case of name, surname and age duplicate, you don't have to revisit your data model.

CREATE TABLE users (
  userId int,
  name text,
  surname text,
  age int,
  adress text,
  PRIMARY KEY(userid)
);

score 0 · Answer 5 · answered Nov 15 '18 at 10:29

0

I would state specifically that partition key should be unique.I could not get it in one place but from the following statements.

Cassandra needs all the partition key columns to be able to compute the hash that will allow it to locate the nodes containing the partition.
The partition key has a special use in Apache Cassandra beyond showing the uniqueness of the record in the database..
Please note that there will not be any error if you insert same partition key again and again as there is no constraint check.
Queries that you'll run equality searches on should be in a partition key.

References

https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

how Cassandra chooses the coordinator node and the replication nodes?

Insert query replaces rows having same data field in Cassandra clustering column

answered Nov 15 '18 at 10:29

Alex Punnen

5,287
3
59
71

1

sorry, but it's incorrect - partition key isn't required to be unique - if you have uniq partition key then you have "skinny" partitions, consisting of only one row. If you have non-unique partition key, then you need to have clustering columns, so they form an uniq primary key. I have seen the customers where they had only one partition with millions of rows inside (due the data model errors) – Alex Ott Nov 15 '18 at 12:40
Thanks, it is bit confusing. But is not a node handling a range of tokens, so that a set of unique partitions map to one node and thereby handling this. Please also go through the links as I have quoted most from them. I did this test and check some time back so do not remember exactly – Alex Punnen Nov 17 '18 at 05:08
Question was about primary key that identifies individual row in database. Partition key used to calculate the token and select node to keep data on. Single partition may contain many rows, each identified by its own set of clustering keys inside partition. – Alex Ott Nov 17 '18 at 07:42

Primary key in cassandra is unique?

5 Answers5

Linked