In my case I have a table structure like this:
table_1 {
entity_uuid text
,fk1_uuid text
,fk2_uuid text
,int_timestamp bigint
,cnt counter
,primary key (entity_uuid, fk1_uuid, fk2_uuid, int_timestamp)
}
The text columns are made up of random strings. However, only entity_uuid
is truly random and evenly distributed. fk1_uuid
and fk2_uuid
have much lower cardinality and may be sparse (sometimes fk1_uuid=null
or fk2_uuid=null
).
In this case, I can either define only entity_uuid
as the partition key or entity_uuid, fk1_uuid, fk2_uuid
combination as the partition key.
And this is a LOOKUP-type of table, meaning we don't plan to do any aggregations/slice-dice based on this table. And the rows will be rotated out since we will be inserting with TTL defined for each row.
Can someone enlighten me:
- What is the downside of having too many partition keys with very few rows in each? Is there a hit/cost on the storage engine level?
- My understanding is the cluster keys are ALWAYS sorted. Does that mean having text columns in a cluster will always incur tree balancing cost?
Well you can tell where my heart lies by now. However, when all rows in a partition all TTL-ed out, that partition still lives, or is there a way they will be removed by the DB engine as well?
Thanks,
Bing