Inconsistent memcache lookups from different Google AppEngine modules of the same app

Question

We have come across a sporadic inconsistency when doing memcache lookups from different modules (aka "services") within a single AppEngine app (Python). The scenario is as such:

Module "portal" creates and later increments integer values for a certain memcache key:

@ndb.tasklet
def increment_total_async(meter):
    total = yield memcache.Client().incr_async(
        key="some-key",
        namespace=TOTALS_NAMESPACE,
        delta=meter.quantity)

    if total is not None:
        raise ndb.Return(total)

    raise ndb.Return(None)

Module "dashboard" later looks up the current value using the same key and namespace:

@ndb.transactional_tasklet
def get_total_async(subscription, metric):
    total = memcache.get(
        key="some-key",
        namespace=TOTALS_NAMESPACE)

    if total is not None:
        raise ndb.Return(total)

    raise ndb.Return(None)

Interestingly, get_total_async sometimes (rarely) returns an integer value of 0 (not None) when called from the "dashboard" module, while when called from the "portal" module it returns the (correct) value >0.

This is somewhat surprising and makes us believe that there may be a race condition in the memcache service - possibly only when one looks up the same key from different modules (now called "services" by Google).

Has anyone come across a similar phenomenon? This is a "Heisenbug", i.e., difficult to reproduce.

We were also wondering if the fact that we use "incr_async" (and not the synchronous "memcache.incr" call) may play a role. — Soeren, Feb 20 '17 at 23:44
The memcache values can dissapear at anytime. Can you show the code restoring the values in such case? — Dan Cornilescu, Feb 22 '17 at 16:39
Is `total` in `increment_total_async()` intended to be a numeric value or the `UserRPC` instance returned by async functions whos `get_result()` methods return a value? It is indeed strange that `get_total_async()` sometimes returns 0 but knowing how and when the value is set would be helpful. As you don't specify an `initial_value`, the `incr_async()` call should return `None` unless the value is set elsewhere. Lastly, during the instances where you saw this behavior, did it occur within a fairly tight time frame and is that why you suspect this may be a race condition? — Nicholas, Feb 22 '17 at 18:28
Is the counter reset after it returns 0 or does it restore itself on a subsequent read? Does "portal" start reporting the "bad" value after "dashboard" does? How does your code handle the case when a key is missing (eg. does it happen to reset it to 0)? Are Memcache accesses between "portal" and "dashboard" synchronized in some way (eg. is "dashboard" guaranteed to read only after "portal" writes)? — Adam, Mar 25 '17 at 22:48
There are a lot of unknowns which prevent any sound conclusions from being made, but the chances are the issue is not within the Memcache service as this isn't behavior that has been commonly observed elsewhere. — Adam, Mar 25 '17 at 22:48

Inconsistent memcache lookups from different Google AppEngine modules of the same app

0 Answers0