1

I have a table t1 which gives me a Compression Ratio of 0.25:

CREATE TABLE t1(
id varchar,
c2 text,
c3 float,
c4 float,
c5 float,
c6 text,
c7 text,
c8 text,
PRIMARY KEY ((id),c2, c3, c4, c5, c6, c7, c8)
);

and an almost identical table t2 which gives me a Compression Ratio of 0.65:

CREATE TABLE t2(
id varchar,
extraid varchar,
c2 text,
c3 float,
c4 float,
c5 float,
c6 text,
c7 text,
c8 text,
PRIMARY KEY ((id),extraid)
);

As you can I am only replacing the 7 clustering columns in the Primary Key with 1 column! This changes the compression ratio from 0.25 to 0.65. Any idea why this could be happening?

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
Des0lat0r
  • 482
  • 3
  • 18
  • 1
    A friendly reminder that Stack Overflow is for getting help with coding, algorithm, or programming language problems. I have cast a vote to have your post moved to https://dba.stackexchange.com instead. Cheers! – Erick Ramirez Oct 25 '22 at 06:57

1 Answers1

1

Yes, it can. But it's mostly because the way that the data is stored on disk depends heavily on the table schema.

You haven't provided details of how you arrived at the compression ratios but I suspect you're not comparing apples-for-apples. For example, not all SSTables are equal. Again, without details of your test it's impossible to know. Cheers!

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
  • The truth is I have some big partitions but my columns are equal without "nulls". What kind of details do you mean? The data was inserted with the same way through batch statements. I think it is just that columns that are part of the primary key will be stored and compressed together so ratio will be smaller (meaning better) and the reading process will be faster (because clustering columns are organized together). – Des0lat0r Oct 25 '22 at 07:53