4

I'm using a collection of BT tables to store data that's being used for both batch and realtime operations, and want to optimize performance, especially around latency of random access reads. And while I do know the underlying BT codebase fairly well, I don't know how all of that translates into best practices with Cloud Bigtable, which isn't quite the same as the underlying code. So I've got some questions for the experts:

(1) I've spotted answered in other questions that Cloud BT stores all column families in a single locality group. Since I often have to read data from multiple column families in a single row, this is great for my needs... but I'm noticing a significant slowdown when reading N CF's rather than one CF in an operation. In this case, each cell is small (~1kB) and the total number of cells being read isn't big, so I'm not expecting this to be dominated by network latency, bottlenecks, or the like; and the cells aren't being hammered on by writes, so I'm not expecting an uncontrolled uncompacted log. But:

  • Are there any general performance tips for this type of read pattern?
  • What are the major and minor compaction intervals used in cloud BT? Are these tunable?

(2) The read API does accept sparse sets of rows in the read request. How much optimization is happening under the hood with these? Is there some cloud BT server that I'm hitting within the instance which is parallelizing these underlying operations across tabletservers, or does the cloud BT API just go straight to the tabletservers? (Which is to say, is using this API indeed more efficient than doing for loops?)

(3) Related, I'm using the Python client library. Are there any things to know about its parallelization of operations, or its parallelizability -- e.g., any gotchas with using it from multiple threads?

(4) Anything else I should know about how to make random reads scream?

(Footnote for future readers of this question who don't know the innards of BT: you can think of the entire table as divided vertically into locality groups, the locality groups into column families, and the column families into columns, and horizontally into tablets, which contain rows. Each locality group basically operates like an independent bigtable under the hood, but in cloud BT all your families are in a single LG so this level of abstraction doesn't mean much. The horizontal split into tablets is done dynamically at regular intervals, to avoid hotspotting of tablets, so a single tablet may be as small as one row or as large as millions. Within each (locality group) * (tablet) rectangle of your table, the data is stored in the style of a journaling file system: there's a log file of recent writes (just "row, column, value" tuples, basically). Every minor compaction interval, a new log file is started, and the previous log file is converted into an SSTable, which is a file that stores a sorted map from string to string for efficient reads. Every major compaction interval, all the SSTables are combined into a single SSTable. So a single write to BT is just an append to the log, and a read has to check all the SSTables currently present, plus the log file. Thus if you're writing a lot to a tablet, reads on it get slower.

SSTables actually come in multiple wire formats which are optimized for various access patterns, like random access from spinning disk, batch access, and so on, so depending on those details reading one of those can take 1-3 iops against the underlying storage system, which is generally a distributed disk.)

Maxim
  • 4,075
  • 1
  • 14
  • 23

2 Answers2

2

You asked a lot of questions :) I can give a tip on (1). The documentation mentions that

Store data you will access in a single query in a single column family.
Column qualifiers in a single column family have a physical as well as a logical relationship. In general, all of the column qualifiers in a single column family are stored together, accessed together and cached together. As a result, a query that accesses a single column family might execute more efficiently than a query spanning column families.

which seems in line with what you experience. So if you're able to group data into a single CF it might help your read times.

Robert Lacok
  • 4,176
  • 2
  • 26
  • 38
1

There's a lot of sub questions in this, so you might have better results breaking them out into separate questions. In the meantime to attempt to answer some of them:

  1. The minor and major compaction intervals for Cloud Bigtable are unpublished as they are subject to change. Based on the current GC Documentation, a garbage collection (major compaction), will happen within a week. As noted in the Compactions Documentation, these settings are not user configurable.

  2. There isn't read parallelization on the Cloud Bigtable side. You would get better performance from parallelizing in your client.

  3. I'm not overly familiar with the Python client, so I'll let others chime in on that. Note however that it's in Beta compared to other GA clients which will have had more performance tuning done on them.

  4. A well thought out Schema Design is the best bet for ensuring continued performance for a table. Additionally, using the Key Visualizer is effective in diagnosing any performance issues that arise, e.g. hot spotting.

Dan
  • 165
  • 7