0

I am using the below table in our use case -

create table test_new (
    employee_id text,
    employee_name text,
    value text,
    last_modified_date timeuuid,
    primary key (employee_id, last_modified_date)
   );

create index employee_name_idx on test_new (employee_name);

In my above table employee_id will be unique always starting from 1 till 32767. So our query pattern is like this:

  1. Give me everything for any of the employee_id?
  2. Give me everything for what has changed in last 5 minutes?
  3. Give me everything for any of the employee_name?

I will be inserting below data into my above table -

insert into test_new (employee_id, employee_name, value, last_modified_date) 
        values ('1', 'e27',  'some_value', now());
insert into test_new (employee_id, employee_name, value, last_modified_date) 
        values ('2', 'e27',  'some_new_value', now());
insert into test_new (employee_id, employee_name, value, last_modified_date) 
        values ('3', 'e28',  'some_new_again_value', now());

I am able to execute all my above query patterns but there is still one issue.


My question is about avoiding this particular scenario for the query below. What if, somehow mistakenly tries to execute the query below. If they do, it will then create another row with employee_id as 1 and with other fields? I don't want anyone to insert the same employee_id again if it is already there in the Cassandra database..

insert into test_new (employee_id, employee_name, value, last_modified_date) 
         values ('1', 'e29',  'some_new_value', now());

Any thoughts? I know is this is a debatable situation because of the debate over using an RDBMS vs Cassandra

And also creating an index on employee_name will cause any problem? In my example, same employee_name can have multiple employee_id but with different values. Keeping in mind that employee_id won't be larger than 32767 so which means maximum number of rows will be 32767 in the above table..

I am running Cassandra 1.2.9

Crowie
  • 3,220
  • 7
  • 28
  • 48
AKIWEB
  • 19,008
  • 67
  • 180
  • 294
  • If (employee_id:value) is unique (or meant to be) then why is that not your PK? That would stop duplicate inserts (though it will act as an update, so your last_modified_date would change). – AndySavage Nov 13 '13 at 17:26

1 Answers1

1

I don't want anyone to insert the same employee_id again if it is already there in the Cassandra database

The only way to guarantee that ("insert only if no row with the same PK already exists") that Cassandra offers are the conditional inserts/udpates introduced in Cassandra 2.0: http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0.

But be warned that the performance of that is not very good. If you don't add new employees all too often then it might be just what you want, but if that's a query that is done a lot and that contention is likely, this probably won't work all that well. But the fact that you said you won't need more than 32K employee_id value suggest that adding a new employee is not really a frequent query at all.

That being said, if the only worry is that you don't reuse twice the same employee_id, the standard solution in C* is to just use a uuid for the employee_id so you don't have to worry about collisions.

pcmanus
  • 421
  • 2
  • 2