3

Both MyRocks (MySql) and Cassandra uses LSM architecture to store their data. So I have populated around 5 million rows in MySql with MyRocks as storage engine and also in Cassandra. In Cassandra it takes only 1.7 GB of disk space while in MySql with MyRocks as storage engine, it takes 19 GB.

Am I missing something? Both use the same LSM mechanism. But why do they differ in data size?

Update:

I guess it has something to do with the text column. My Table Structure is (bigint,bigint,varchar,text).

  • Rows populated: 300 000
  • In MyRocks the data size 185MB
  • In Cassandra - 13 MB.

But if I remove the text column then:

  • MyRocks - 21.6 MB
  • Cassandra - 11 MB

Any idea about this behaviour?

James Z
  • 12,209
  • 10
  • 24
  • 44
Aravind
  • 163
  • 2
  • 13

2 Answers2

5

Well the reason for the above behaviour is due to the rocksdb_block_size set to 4kb. Due to smaller data blocks the compressor finds lesser amount of data to compress. Setting it to 16kb solved the issue. Now I get the similar data size as of cassandra.

Aravind
  • 163
  • 2
  • 13
0

Not 100% on MyRocks. But Cassandra is LSM and also Key value store. Which means if your column is 'null' it won't be stored on disk. Traditionally RDBMS will still consume some space (varchars, null characters pointers etc) so this may account for your lost space.

Additionally cassandra compresses data. Try: ALTER myTable WITH compression = { 'enabled' : false };

Highstead
  • 2,291
  • 3
  • 26
  • 30
  • Yeah, but I have populated values for all the columns yet the difference. And an update to the above question. I guess it has something to do with the text column. My Table Structure is (bigint,bigint,varchar,text). No Of rows Populated : 3 lakhs Now in MyRocks the size is 185MB In Cassandra - 13 MB. But if I remove the text column then, MyRocks - 21.6 MB Cassandra - 11 MB Any idea about this behaviour? – Aravind Nov 03 '17 at 06:25
  • What is the average size of the "text column"? 13MB/3lakh = 43 bytes/row -- not room for much text! – Rick James Nov 07 '17 at 15:03
  • @AravindTyson Additionally cassandra actually compresses data (as a heads up). If you `ALTER myTable WITH compression = { 'enabled' : false };` – Highstead Nov 08 '17 at 15:45