Cassandra Skinny vs Wide Row for time series - consumption

Question

i want to store every second one value to a table. Therefore i testet two approches against each other. If I have understood correctly, the data should be stored internally almost identical.

Wide-Row

CREATE TABLE timeseries (
  id int,
  date date,
  timestamp timestamp,
  value decimal,
  PRIMARY KEY ((id, date), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC) AND
  compaction={'class': 'DateTieredCompactionStrategy'} 
   and  compression = { 'sstable_compression' : 'DeflateCompressor' };

Skinny Row

  CREATE TABLE timeseries(
    id int,
    date date,
    "0" decimal, "1" decimal,"2" decimal, -- ... 86400 decimal values
                   -- each column index is the second of the day
    PRIMARY KEY ((id, date))
)

Test:

10 different id's
1 million values (100.000 per id)
each values is increased by one minute

In my test the skinny row approach for a sinus function only consums half of the storage for 1 million values. Even the random test is significant. Can somebody explain this behaviour?

Ashraful Islam · Accepted Answer · 2017-06-12T09:23:40.967

The only difference between these schema is the cell key

A sample cell of The wide row model :

["2017-06-09 15\\:05+0600:value","3",1496999149885944]
          |                 |     |          |
       timestamp         column  value   timestamp

And A sample cell of the Skinny row model :

   ["0","3",1497019292686908]
     |   |          | 
  column value   timestamp

You can clearly see that wide row model cell key is timestamp value and column name of value. And for skinny model cell key is only column name.

The overhead of wide row model is the timestamp(8 bytes) and the size of column name (value).you can keep the column name small and instead of using timestamp, use int and put the seconds of the day, like your skinny row column name. This will save more space.

Thank you. I made this test and stored all timestamps with integers. The result is nearly the same.Great to understand Cassandra better. — itstata, Jun 12 '17 at 05:47

Cassandra Skinny vs Wide Row for time series - consumption

1 Answers1