Modeling nested data with possible null values in cassandra

Question

I have a situation similar to the one described in question Nested data data modelling in Cassandra?

I have a project entity for which my app needs to be able to display project specific information, including the agencies and vendors participating in the project. The project entity could be described as follows:

{ "id": 7162fe80-1e44-11e4-8c21-0800200c9a66, "name": "Test Project", "synopsis": "Lorem Text goes here" "agencies" : [{ "id": c3e28810-1e44-11e4-8c21-0800200c9a66 "name": "Test Agency" }], "vendors": [{ "id": 1c0ba760-1e45-11e4-8c21-0800200c9a66 "name": "Test Vendor" }] }

However, sometimes the project might not have any vendors or agencies (or might have one of the entities, but not the other):

{ "id": 7162fe80-1e44-11e4-8c21-0800200c9a66, "name": "Test Project", "synopsis": "Lorem Text goes here" "agencies" : [], "vendors": [] }

What would be a good way of modeling this data?

I have tried the following schemas but all seem to have issues:

Schema 1: CREATE TABLE projects ( id uuid, name text, synopsis text, vendor_id uuid, vendor_name text, agency_id uuid, agency_name text PRIMARY KEY (id, vendor_id, agency_id)

But with this approach, I can't have projects with no vendors or agencies (vendor_id or agency_id cannot be null).

Schema 2: CREATE TABLE projects ( id uuid, name text, synopsis text, vendor_id uuid, vendor_name text, agency_id uuid, agency_name text PRIMARY KEY (id)

But with this approach, I can only have one vendor and one agency per project.

I am hesitant to use maps/lists/sets for modeling this data as this seems to be a new feature in CQL 2/3. I am also worried about "data consistency". For example, vendor names frequently change, and I would like projects to reflect the "latest name" of the vendor.

score 0 · Answer 1 · answered Aug 11 '14 at 11:21

If vendor names frequently change, it's probably not the best idea to de-normalize schema as you described: you have to update all vendor/agency records after each name change.

You can create typical normalized tables for projects, vendors and agencies and do joins in application level:

CREATE TABLE projects (
    id uuid,
    name text,
    vendor_id list<uuid>,
    agency_id list<uuid>,
    PRIMARY KEY (id));

CREATE TABLE vendors (
    id uuid,
    name text,
    PRIMARY KEY (id));

CREATE TABLE agencies (
    id uuid,
    name text,
    PRIMARY KEY (id));

PS. Not yet released C* 2.1 will have support for user-defined-types, so you can do this:

CREATE TYPE vendor (
    id uuid,
    name text);

CREATE TYPE agency (
    id uuid,
    name text);

CREATE TABLE projects (
    id uuid,
    name text,
    vendors list<vendor>,
    agencies list<agency>,
    PRIMARY KEY (id));

Modeling nested data with possible null values in cassandra

1 Answers1