0

Let's say I create a table with 10 initial split points and therefore 10 initial tablets, and after a while one of them gets to max size and is auto-split.

Assuming my key is partition_counter, and my counter keeps increasing, I'll be inserting into the newly created tablet, and never to the old one.

After all records in old one hit their TTL (and a compaction takes place), the tablet would then be empty.

Is this tablet removed automatically? if not, are there any performance or cost implications of empty tablets sticking around?

Admittedly we should strive to make our # of partitions and TTL such that they never get big enough for an auto-split and tablets remain constant, but I'm trying to address all scenarios.

minimo
  • 1,396
  • 1
  • 9
  • 9

1 Answers1

2

Let's say I create a table with 10 initial split points and therefore 10 initial tablets, and after a while one of them gets to max size and is auto-split.

Assuming my key is partition_counter, and my counter keeps increasing, I'll be inserting into the newly created tablet, and never to the old one.

You haven't provided sufficient information on your row key / schema design, so this is just a guess on my part, but please note that if you are writing always to a key that is large than all other existing keys in Bigtable, you will be hotspotting the node that has the last tablet, and since you won't be able to distribute writes, you will always get the performance equivalent to a single node, regardless of the size of your cluster.

Avoiding typical schema design pitfalls

  • Bad example: <date>-<some id/hash> — date-major order, poor scaling
  • Good example: <some id/hash>-<date> — items/users/devices-major order, good scaling

If you're already not inserting a date or time as a prefix to your row keys, you're doing it correctly.

After all records in old one hit their TTL (and a compaction takes place), the tablet would then be empty.

Is this tablet removed automatically? if not, are there any performance or cost implications of empty tablets sticking around?

Yes, such a tablet will disappear at compaction time and your cluster won't even notice. There is no performance or cost implication of empty tablets, and you shouldn't worry about them.

Admittedly we should strive to make our # of partitions and TTL such that they never get big enough for an auto-split and tablets remain constant, but I'm trying to address all scenarios.

No, you shouldn't worry about # of tablets, # partitions, or TTL. There isn't a requirement that you avoid splits or having a constant number of tablets. That may be an issue in other storage systems, but not in Bigtable: it scales very well.

You should only ensure that your schema design is done such that as you scale in size of data, your reads/writes are distributed along the keyspace (which is sharded across nodes), rather than monotonically increasing or decreasing.

For more info, please see these docs:

Also, if your use case fits into a popular pattern, consider using a frontend to Bigtable that simplifies schema management for specific use cases, e.g.,

Misha Brukman
  • 12,938
  • 4
  • 61
  • 78