0

Following DataStax's advice to 'use roughly one table per query pattern' as mentioned here, I have set up the same table twice, but keyed differently to optimize read times.

-- This table supports queries that filter on specific first_ids and a gt/lt filter on time
CREATE TABLE IF NOT EXISTS table_by_first_Id
(
    first_id  INT,
    time      TIMESTAMP,
    second_id INT,
    value     FLOAT,
    PRIMARY KEY (first_id, time, second_id)
);

-- Same table, but rearranged to filter on specific second_ids and the same gt/lt time filter
CREATE TABLE IF NOT EXISTS table_by_second_Id
(
    second_id INT,
    time      TIMESTAMP,
    first_id  INT,
    value     FLOAT,
    PRIMARY KEY (second_id, time, first_id)
);

Then, I have created 2 models using DataStax's Python driver, one for each table.

class ModelByFirstId (...)
class ModelBySecondId (...)

The Problem

I can't seem to figure out how to cleanly ensure atomicity when inserting into one of the tables to also insert into the other table. The only thing I can think of is

def insert_some_data(...):
    ModelByFirstId.create(...)
    ModelBySecondId.create(...)

I'm looking to see if there's an alternative way to ensure that insertion into one table is reflected into the other - perhaps in the model or table definition, in order to hopefully protect against errant inserts into just one of the models.

I'm also open to restructuring or remaking my tables altogether to accommodate this if needed.

aphrid
  • 589
  • 8
  • 25
  • have you tried to use a Lightweight transaction for this use case? – Carlos Monroy Nieblas Feb 28 '20 at 17:45
  • Hmm, from what I can tell, LWTs require me to write raw CQL - and while this isn't really a problem, I want to see if I can leverage the modelling aspect of DataStax to reduce errors in the vein of SQLAlchemy (especially when collaborating with others). – aphrid Feb 28 '20 at 18:15

1 Answers1

0

NoSQL databases specially made for high availability and partition tolerance (AP of CAP) are not made to provide high referential integrity. Rather they are designed to provide high throughput and low latency reads and writes. Cassandra itself has no concept of referential integrity across tables. But do look for LWT (light weight transactions) and batches concept for your use case.

Please find some good material to read for the same:

https://www.oreilly.com/content/cassandra-data-modeling/

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useBatch.html

Specifically for your use case, try if you can go for below single table data model:

CREATE TABLE IF NOT EXISTS table_by_Id
(
    primary_id INT,
    secondary_id INT,
    time      TIMESTAMP,
    value     FLOAT,
    PRIMARY KEY (primary_id ,secondary_id ,time)
);

for each input record you can create two entries in the table , one with first id as primary_id ( second_id and secondary_id) and second record with second_id as primary_id (and first_id as secondary_id). Now use batch inserts (as mentioned in above documentation. This might not be a best solution for your problem but think about it.