We've setup Bigtable cluster with 5 nodes, and GCP console states that it should support 50K QPS @ 6ms for reads and writes.
We are trying to load a large dataset (~800M records) with ~50 fields containing mostly numeric data, and a few short strings. The keys are 11-digit numeric strings.
When loading this dataset via HBase API from a single client VM in GCE we observe up to 4K QPS when putting each field into a separate column. We use single HBase connection, and multiple threads (5-30) doing batch puts of 10K records.
When combining all fields into a single column (Avro-encoded, ~250 bytes per record), the write performance with batch puts improves to 10K QPS. Number of concurrent threads doesn't seem to affect QPS. When using a separate HBase connection per thread, write performance increases to 20K QPS with 5 threads.
The client VM is in the same availability zone as Bigtable cluster, and it stays almost idle during the load, so it doesn't look like the bottleneck is on the client side.
Questions:
- From our tests it appears that write QPS decreases with the number of columns inserted. Is this expected, and how can this relationship be quantified? (BTW, it'd be great if this was mentioned in Bigtable performance documentation).
- What may we be missing in order to achieve the declared write QPS? My understanding is that each cluster node should support 10K write QPS, however it appears that we are going against a single node with a single HBase connection, and only against 2 nodes with multiple HBase connections.