I have a cassandra table
CREATE TABLE users_by_id (
id bigint PRIMARY KEY,
name text,
email text,
password text,
);
and
CREATE TABLE members_by_org_id_user_id (
organisation bigint,
user bigint,
roles set<bigint>,
PRIMARY KEY (organisation, user)
) WITH CLUSTERING ORDER BY (user DESC);
If I want to retrieve the name
of each member of an organisation I could:
SELECT user FROM members_by_org_id_user_id WHERE organisation = ?
and then performSELECT name FROM users_by_id WHERE id in ?
(coordinator must contact many nodes = bad Is the IN relation in Cassandra bad for queries?)SELECT user from MEMBERS_by_org_id_user_id WHERE organisation = ?
and then perform a query per userSELECT name FROM users_by_id WHERE id = ?
(application must contact many nodes, because the primary keys of the users are different = less bad? Not perfect)- Change
members_by_org_id_user_id
to
CREATE TABLE members_by_org_id_user_id (
organisation bigint,
user bigint,
name text,
email text,
PRIMARY KEY (organisation, user)
) WITH CLUSTERING ORDER BY (user DESC);
The problem with the third approach is, that if a user is updated, all the member rows that relate to that user will also need to be updated, which, while allowing the application to only contact one node, means many writes may need to be done for each update
How could I model my data to decrease each of these problems or get rid of them entirely?
An organisation could, theoretically, consist of up to 2000 members and a user could be in up to 20 organisations.