i want to store every second one value to a table. Therefore i testet two approches against each other. If I have understood correctly, the data should be stored internally almost identical.
Wide-Row
CREATE TABLE timeseries (
id int,
date date,
timestamp timestamp,
value decimal,
PRIMARY KEY ((id, date), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC) AND
compaction={'class': 'DateTieredCompactionStrategy'}
and compression = { 'sstable_compression' : 'DeflateCompressor' };
Skinny Row
CREATE TABLE timeseries(
id int,
date date,
"0" decimal, "1" decimal,"2" decimal, -- ... 86400 decimal values
-- each column index is the second of the day
PRIMARY KEY ((id, date))
)
Test:
- 10 different id's
- 1 million values (100.000 per id)
- each values is increased by one minute
In my test the skinny row approach for a sinus function only consums half of the storage for 1 million values. Even the random test is significant. Can somebody explain this behaviour?