2

I'm using sharded counters (https://cloud.google.com/appengine/articles/sharding_counters) in my GAE application for performance reasons, but I'm having some trouble understanding why it's so slow and how I can speed things up.

Background
I have an API that grabs a set of 20 objects at a time and for each object, it gets a total from a counter to include in the response.

Metrics
With Appstats turned on and a clear cache, I notice that getting the totals for 20 counters makes 120 RPCs by datastore_v3.Get which takes 2500ms.

Thoughts
This seems like quite a lot of RPC calls and quite a bit of time for reading just 20 counters. I assumed this would be faster and maybe that's where I'm wrong. Is it supposed to be faster than this?

Further Inspection
I dug into the stats a bit more, looking at these two lines in the get_count method:

all_keys = GeneralCounterShardConfig.all_keys(name)
for counter in ndb.get_multi(all_keys):

If I comment out the get_multi line, I see that there are 20 RPC calls by datastore_v3.Get totaling 185ms.

As expected, this leaves get_multi to be the culprit for 100 RPC calls by datastore_v3. Get taking upwards of 2500 ms. I verified this, but this is where I'm confused. Why does calling get_multi with 20 keys cause 100 RPC calls?

Update #1
I checked out Traces in the GAE console and saw some additional information. They show a breakdown of the RPC calls there as well - but in the sights they say to "Batch the gets to reduce the number of remote procedure calls." Not sure how to do that outside of using get_multi. Thought that did the job. Any advice here?

Update #2
Here are some screen shots that show the stats I'm looking at. The first one is my base line - the function without any counter operations. The second one is after a call to get_count for just one counter. This shows a difference of 6 datastore_v3.Get RPCs.

Base Line enter image description here

After Calling get_count On One Counter enter image description here

Update #3
Based on Patrick's request, I'm adding a screenshot of info from the console Trace tool. enter image description here

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
Brandon
  • 2,886
  • 3
  • 29
  • 44
  • How many keys in all_keys? – Tim Hoffman Jan 07 '16 at 08:56
  • It appears that you call get_multi in a loop, but we cannot see it from a small code snippet that you provided. – Andrei Volgin Jan 07 '16 at 10:38
  • Hi @TimHoffman, it's 20 keys. – Brandon Jan 07 '16 at 13:48
  • Hi @AndreiVolgin, that's correct. – Brandon Jan 07 '16 at 13:49
  • It should be 1 call, not 120 calls. There is a problem somewhere. – Andrei Volgin Jan 07 '16 at 14:00
  • That's what I'm thinking. I've added some stats before and after for calling get_count on one counter in the original post. – Brandon Jan 07 '16 at 14:11
  • Unfortunately, ndb's `get_multi` isn't a straightforward "execute a single RPC" call. If you take a look at how [get_multi()](https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/3360752d371e84d9d3433be97a75324f267ec8f8/ndb/model.py#L3899) is implemented, you'll see it takes advantage of ndb's [AutoBatcher](https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/master/ndb/context.py#L132). By default, ndb should be willing to batch up to 1000 `Get`s together, so it **should** be happy to process all of them for you. – Patrick Costello Jan 08 '16 at 17:40
  • In the traces in the console, if you click on the `datastore_v3.Get` trace it should show you how many keys are involved in each call. As a side note, I would disable appstats before doing this, as Appstats also writes to the Datastore so it can be harder to track down any bugs. Can you let us know what those traces look like? – Patrick Costello Jan 08 '16 at 17:40
  • Will do @PatrickCostello. Thanks! – Brandon Jan 08 '16 at 20:05
  • Hey @PatrickCostello, I added the screenshot from Traces. Looks like 10 keys, right? I made sure to turn off Appstats. There are a ton of batches of 5 datastore_v3.Gets. – Brandon Jan 08 '16 at 21:59
  • Interesting. Can you file a bug on ndb's github page? https://github.com/GoogleCloudPlatform/datastore-ndb-python/issues If possible, can you include a full code snippet? – Patrick Costello Jan 08 '16 at 23:14

2 Answers2

1

Try splitting up the for loop that goes through each item and the actual get_multi call itself. So something like:

all_values = ndb.get_multi(all_keys)
for counter in all_values:
    # Insert amazeballs codes here

I have a feeling it's one of these:

  1. The generator pattern (yield from for loop) is causing something funky with get_multi execution paths
  2. Perhaps the number of items you are expecting doesn't match actual result counts, which could reveal a problem with GeneralCounterShardConfig.all_keys(name)
  3. The number of shards is set too high. I've realized that anything over 10 shards causes performance issues.
Bardia D.
  • 693
  • 3
  • 8
  • I tried removing the get_multi from loop and it has the same problem, but I haven't tried limiting the number of shards yet. I'll give that a shot. – Brandon Jan 07 '16 at 15:56
0

When I've dug into similar issues, one thing I've learned is that get_multi can cause multiple RPCs to be sent from your application. It looks like the default in the SDK is set to 1000 keys per get, but the batch size I've observed in production apps is much smaller: something more like 10 (going from memory).

I suspect the reason it does this is that at some batch size, it actually is better to use multiple RPCs: there is more RPC overhead for your app, but there is more Datastore parallelism. In other words: this is still probably the best way to read a lot of datastore objects.

However, if you don't need to read the absolute most current value, you can try setting the db.EVENTUAL_CONSISTENCY option, but that seems to only be available in the older db library and not in ndb. (Although it also appears to be available via the Cloud Datastore API).

Details

If you look at the Python code in the App Engine SDK, specifically the file google/appengine/datastore/datastore_rpc.py, you will see the following lines:

max_count = (Configuration.max_get_keys(config, self.__config) or
             self.MAX_GET_KEYS)
...

if is_read_current and txn is None:
  max_egs_per_rpc = self.__get_max_entity_groups_per_rpc(config)
else:
  max_egs_per_rpc = None

...

pbsgen = self._generate_pb_lists(indexed_keys_by_entity_group,
                                 base_req.ByteSize(), max_count,
                                 max_egs_per_rpc, config)

rpcs = []
for pbs, indexes in pbsgen:
  rpcs.append(make_get_call(base_req, pbs,
                            self.__create_result_index_pairs(indexes)))

My understanding of this:

  • Set max_count from the configuration object, or 1000 as a default
  • If the request must read the current value, set max_gcs_per_rpc from the configuration, or 10 as a default
  • Split the input keys into individual RPCs, using both max_count and max_gcs_per_rpc as limits.

So, this is being done by the Python Datastore library.

Evan Jones
  • 386
  • 5
  • 7