I have a situation similar to the one described in question Nested data data modelling in Cassandra?
I have a project entity for which my app needs to be able to display project specific information, including the agencies and vendors participating in the project. The project entity could be described as follows:
{
"id": 7162fe80-1e44-11e4-8c21-0800200c9a66,
"name": "Test Project",
"synopsis": "Lorem Text goes here"
"agencies" : [{
"id": c3e28810-1e44-11e4-8c21-0800200c9a66
"name": "Test Agency"
}],
"vendors": [{
"id": 1c0ba760-1e45-11e4-8c21-0800200c9a66
"name": "Test Vendor"
}]
}
However, sometimes the project might not have any vendors or agencies (or might have one of the entities, but not the other):
{
"id": 7162fe80-1e44-11e4-8c21-0800200c9a66,
"name": "Test Project",
"synopsis": "Lorem Text goes here"
"agencies" : [],
"vendors": []
}
What would be a good way of modeling this data?
I have tried the following schemas but all seem to have issues:
Schema 1:
CREATE TABLE projects (
id uuid,
name text,
synopsis text,
vendor_id uuid,
vendor_name text,
agency_id uuid,
agency_name text
PRIMARY KEY (id, vendor_id, agency_id)
But with this approach, I can't have projects with no vendors or agencies (vendor_id or agency_id cannot be null).
Schema 2:
CREATE TABLE projects (
id uuid,
name text,
synopsis text,
vendor_id uuid,
vendor_name text,
agency_id uuid,
agency_name text
PRIMARY KEY (id)
But with this approach, I can only have one vendor and one agency per project.
I am hesitant to use maps/lists/sets for modeling this data as this seems to be a new feature in CQL 2/3. I am also worried about "data consistency". For example, vendor names frequently change, and I would like projects to reflect the "latest name" of the vendor.