Using secondary indexes to update rows in Cassandra 2.1

Question

I'm using Cassandra 2.1 and have a model that roughly looks as follows:

CREATE TABLE events (
  client_id bigint,
  bucket int,
  timestamp timeuuid,
  ...
  ticket_id bigint,
  PRIMARY KEY ((client_id, bucket), timestamp)
);
CREATE INDEX events_ticket ON events(ticket_id);

As you can see, I've created a secondary index on ticket_id. This index works ok. events contains around 100 million rows, while only 5 million of these rows have around 50,000 distinct tickets. So a ticket - on average - has 100 events.

Querying the secondary index works without supplying the partition key, which is convenient in our situation. As the bucket column is sometimes hard to determine beforehand (i.e. you should know the date of the events, bucket is currently the date).

cqlsh> select * from events where ticket_id = 123;

 client_id | bucket | timestamp | ... | ticket_id
-----------+--------+-----------+-----+-----------

(0 rows)

How do I solve the problem when all events of a ticket should be moved to another ticket? I.e. the following query won't work:

cqlsh> UPDATE events SET ticket_id = 321 WHERE ticket_id = 123;
InvalidRequest: code=2200 [Invalid query] message="Non PRIMARY KEY ticket_id found in where clause"

Does this imply secondary indexes cannot be used in UPDATE queries?

What model should I use to support these changes?

score 5 · Accepted Answer · answered May 11 '15 at 14:15

First of all, UPDATE and INSERT operations are treated the same in Cassandra. They are colloquially known as "UPSERTs."

Does this imply secondary indexes cannot be used in UPDATE queries?

Correct. You cannot perform an UPSERT in Cassandra without specifying the complete PRIMARY KEY. Even UPSERTs with a partial PRIMARY KEY will not work. And (as you have discovered) UPSERTing by an indexed value does not work, either.

How do I solve the problem when all events of a ticket should be moved to another ticket?

Unfortunately, the only way to accomplish this, is to query the keys of each row in events (with a particular ticket_id) and UPSERT ticket_id by those keys. The nice thing, is that you don't have to first DELETE them, because ticket_id is not part of the PRIMARY KEY.

How do I solve the problem when all events of a ticket should be moved to another ticket?

I think your best plan here would be to forego a secondary index all together, and create a query table to work alongside your events table:

CREATE TABLE eventsbyticketid (
  client_id bigint,
  bucket int,
  timestamp timeuuid,
  ...
  ticket_id bigint,
  PRIMARY KEY ((ticket_id), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

This would allow you to query by ticket_id quickly (to obtain your client_id, bucket, and timestamp. This would give you the information you need to UPSERT the new ticket_id on your events table.

You could also then perform a DELETE by ticket_id (on the eventsbyticketid table). Cassandra does allow a DELETE operation with a partial PRIMARY KEY, as long as you have the full partition key (ticket_id). So removing old ticket_ids from the query table would be easy. And to ensure write atomicity, you could batch the UPSERTs together:

BEGIN BATCH
  UPDATE events SET ticket_id = 321 WHERE client_id=2112 AND bucket='2015-04-22 14:53' AND timestamp=4a7e2730-e929-11e4-88c8-21b264d4c94d;
  UPDATE eventsbyticketid SET client_id=2112, bucket='2015-04-22 14:53' WHERE ticket_id=321 AND timestamp=4a7e2730-e929-11e4-88c8-21b264d4c94d
APPLY BATCH;

Which is actually the same as performing:

BEGIN BATCH
  INSERT INTO events (client_id,bucket,timestamp,ticketid) VALUES(2112,'2015-04-22 14:53',4a7e2730-e929-11e4-88c8-21b264d4c94d,321);
  INSERT INTO eventsbyticketid (client_id,bucket,timestamp,ticketid) VALUES(2112,'2015-04-22 14:53',4a7e2730-e929-11e4-88c8-21b264d4c94d,321);
APPLY BATCH;

Side note: timestamp is actually a (reserved word) data type in Cassandra. This makes it a pretty lousy name for a timeuuid column.

score 1 · Answer 2 · answered May 11 '15 at 13:58

You can use the secondary index to query the events for the old ticket, and then use the primary key from those retrieved events to update the events.

I'm not sure why you need to do this manually, seems like something Cassandra should be able to do under the hood.

Using secondary indexes to update rows in Cassandra 2.1

2 Answers2