0

I am trying to better understand what level I can actually mix static and dynamic columns in CQL3. I creating a table in CQL 3 to store user subscription data, but I am also testing out methods to store a tenant_id (with a secondary index) in the table as well so that I can quickly identify what users belong to what tenants (a tenant being a higher-level entity to which the user belongs).

My create table statement is as follows, followed by some dummy data inserts:

cqlsh:demodb> CREATE TABLE subscription_by_user (
              user_id text,
              tenant_id uuid,
              subscription_id int,
              type text,
              distribution int,
              PRIMARY KEY (user_id, subscription_id) );

cqlsh:demodb> CREATE INDEX subscription_ids ON subscription_templates_by_user (tenant_id);


cqlsh:demodb> INSERT INTO subscription_by_user (user_id, tenant_id, subscription_id, type, distribution) VALUES ('user1', f81d4fae-7dec-11d0-a765-00a0c91e6bf6, 2, 'MESSAGE', 4);

cqlsh:demodb> INSERT INTO subscription_by_user (user_id, tenant_id, subscription_id, type, distribution) VALUES ('user2', f81d4fae-7dec-11d0-a765-00a0c91e6bf6, 3, 'TOPIC', 5);

cqlsh:demodb> INSERT INTO subscription_by_user (user_id, tenant_id, subscription_id, type, distribution) VALUES ('user1', f81d4fae-7dec-11d0-a765-00a0c91e6bf6, 3, 'USER', 4);

This is all well and good and I get back what I expect to using the cql 3 API:

cqlsh:demodb> SELECT * FROM subscription_by_user  ;

 user_id | subscription_id | distribution | tenant_id                            | type
---------+-----------------+--------------+--------------------------------------+------------
   user2 |               3 |            5 | f81d4fae-7dec-11d0-a765-00a0c91e6bf6 | TOPIC
   user1 |               2 |            4 | f81d4fae-7dec-11d0-a765-00a0c91e6bf6 | BOARD
   user1 |               3 |            4 | f81d4fae-7dec-11d0-a765-00a0c91e6bf6 | USER

However, the underlying storage of the row results in Cassandra duplicating tenant_id for every new subscription:

[default@demodb] list subscription_templates_by_user;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: user2
=> (column=3:, value=, timestamp=1366150799244000)
=> (column=3:distribution, value=00000005, timestamp=1366150799244000)
=> (column=3:tenant_id, value=f81d4fae7dec11d0a76500a0c91e6bf6, timestamp=1366150799244000)
=> (column=3:type, value=746573742d7479706532, timestamp=1366150799244000)
-------------------
RowKey: user1
=> (column=2:, value=, timestamp=1366150764854000)
=> (column=2:distribution, value=00000004, timestamp=1366150764854000)
=> (column=2:tenant_id, value=f81d4fae7dec11d0a76500a0c91e6bf6, timestamp=1366150764854000)
=> (column=2:type, value=746573742d74797065, timestamp=1366150764854000)
=> (column=3:, value=, timestamp=1366151741325000)
=> (column=3:distribution, value=00000004, timestamp=1366151741325000)
=> (column=3:tenant_id, value=f81d4fae7dec11d0a76500a0c91e6bf6, timestamp=1366151741325000)
=> (column=3:type, value=746573742d74797065, timestamp=1366151741325000)

My question is: is there a way for me (using CQL 3) to structure the table in a way where subscription_id is only listed once per row? If not, is it possible to do this using the older cassandra-cli API? It seems like this would be a viable use case and is really a question of mixing static data with dynamic data (i.e., the subscription bits), but I could very well be wrong. I know that maps and the other CQL 3 collection types are one way of doing this, but the fact that I must retrieve the entire collection at once makes me a bit nervous and I would rather not serialize some complex object into the map value slot if I can help it.

Thanks!

Paul Cichonski
  • 388
  • 1
  • 6

1 Answers1

2

If you don't need to perform range queries by subscription, or query "what are all the subscriptions for user X", then you can can make it part of the partition key:

CREATE TABLE subscription_by_user (
     user_id text,
     tenant_id uuid,
     subscription_id int,
     type text,
     distribution int,
     PRIMARY KEY ((user_id, subscription_id))
);

If you do need either of those, then suck it up and take the storage space penalty; most of it will be compressed away. This is true whether you are using CQL or Thrift.

jbellis
  • 19,347
  • 2
  • 38
  • 47