Cassandra: How to model column family to perform intersect-like queries

Question

Basically I have three attributes: partId, measurementDef and value. Each part (partId) consists of multiple measures (value) of a certain type (measurementDef).

Formatted as a tree it would look something like this:

-part 1
  |- measurementDef 1 -> 15,86
  |- measurementDef 2 -> 19,54
-part 2
  |- measurementDef 1 -> 21,21
  |- measurementDef 3 -> 65,54
  |- measurementDef 4 -> 12,54
-part 3
   ...

Now my question is: How should I model my column family to do something like this:

SELECT partId
FROM <table>
WHERE measurementDef = xxx AND value > 10
INTERSECT
SELECT partId
FROM <table>
WHERE measurementDef = yyy AND value < 50

In other words: I want to find all parts, whose value for measurementDef xxx is higher 10 and whose value for measurementDef yyy is lower 50.

score 1 · Accepted Answer · answered Jul 18 '13 at 12:49

AFAIK, there is no modelling approach to make intersection within single query. I suggest to use following table design:

create table mdefparts(
    mdef int,
    value float,
    parts set<uuid>,
    primary key(mdef, value)
);

Then use queries:

select parts from mdefparts where mdef=XXX and value > 10;
select parts from mdefparts where mdef=YYY and value < 50;

Then join all sets from the first query into one set (say, set1).

Join all sets from the second query into set2.

Then just intersect set1 and set2.

score 0 · Answer 2 · answered Jul 18 '13 at 14:11

There might be an alternative.

You could create a "dummy" column that are set to one or zero during insertion if a threshold is exceed. Then, create a composite column index like so:

    CREATE TABLE mdefparts (
            part int,
            name text,
            val double,
            time timeuuid,
            exceeded boolean,
            PRIMARY KEY ((part, name), exceeded, time)
    );

This would work like so:

    cqlsh:test> CREATE TABLE mdefparts (
            ... part int,
            ... name text,
            ... val double,
            ... time timeuuid,
            ... exceeded boolean,
            ... PRIMARY KEY ((part, name), exceeded, time)
            ... );
    cqlsh:test> 
    cqlsh:test> insert into mdefparts (part, name, val, time, exceeded) values (0, 'y', 100, 37a5de5c-efb3-11e2-99d0-f23c91aec05e, true);
    cqlsh:test> insert into mdefparts (part, name, val, time, exceeded) values (0, 'x', 100, 37a5de6c-efb3-11e2-99d0-f23c91aec05e, true);
    cqlsh:test> insert into mdefparts (part, name, val, time, exceeded) values (0, 'y', 10, 37a5de7c-efb3-11e2-99d0-f23c91aec05e, false);
    cqlsh:test> insert into mdefparts (part, name, val, time, exceeded) values (1, 'y', 1, 37a5de8c-efb3-11e2-99d0-f23c91aec05e, false);
    cqlsh:test> select * from mdefparts;

     part | name | exceeded | time                                 | val
    ------+------+----------+--------------------------------------+-----
        0 |    y |    False | 37a5de7c-efb3-11e2-99d0-f23c91aec05e |  10
        0 |    y |     True | 37a5de5c-efb3-11e2-99d0-f23c91aec05e | 100
        1 |    y |    False | 37a5de8c-efb3-11e2-99d0-f23c91aec05e |   1
        0 |    x |     True | 37a5de6c-efb3-11e2-99d0-f23c91aec05e | 100

    cqlsh:test> select * from mdefparts where name in ('x','y') and part = 0 and exceeded = true;

     part | name | exceeded | time                                 | val
    ------+------+----------+--------------------------------------+-----
        0 |    x |     True | 37a5de6c-efb3-11e2-99d0-f23c91aec05e | 100
        0 |    y |     True | 37a5de5c-efb3-11e2-99d0-f23c91aec05e | 100

The core idea here was here to model the query into the column family. I added a timeuuid to allow multiple measurements over time.

Interesting solution, but this requires to know the threshold up front. — Thomas Fischer, Jul 19 '13 at 06:32
It would need to be defined in your application, yes. You can improve on this by adding to "exceeded" an integer column that denotes value ranges recorded in val. For instance: 0: 0-19 1: 20-99 2: 100-199 3: 200-499 This would be more fine grained and you could detect certain ranges of values you are interested in more easily. Just add a secondary index on this column. Hope that helps! — John, Jul 19 '13 at 07:58

Cassandra: How to model column family to perform intersect-like queries

2 Answers2