11

I'm working on a Firebase Cloud Function that updates some aggregate information on some documents in my DB. It's a very simple function and is simply adding 1 to a total # of documents count. Much like the example function found in the Firestore documentation.

I just noticed that when creating a single new document, the function was invoked twice. See below screenshot and note the logged document ID (iDup09btyVNr5fHl6vif) is repeated twice:

enter image description here

After a bit of digging around I found this SO post that says the following:

Delivery of function invocations is not currently guaranteed. As the Cloud Firestore and Cloud Functions integration improves, we plan to guarantee "at least once" delivery. However, this may not always be the case during beta. This may also result in multiple invocations for a single event, so for the highest quality functions ensure that the functions are written to be idempotent.

(From Firestore documentation: Limitations and guarantees)

Which leads me to a problem with their documentation. Cloud Functions as mentioned above are meant to be idempotent (In other words, data they alter should be the same whether the function runs once or runs multiple times). However the example function I linked to earlier (to my eyes) is not idempotent:

exports.aggregateRatings = firestore
  .document('restaurants/{restId}/ratings/{ratingId}')
  .onWrite(event => {
    // Get value of the newly added rating
    var ratingVal = event.data.get('rating');

    // Get a reference to the restaurant
    var restRef = db.collection('restaurants').document(event.params.restId);

    // Update aggregations in a transaction
    return db.transaction(transaction => {
      return transaction.get(restRef).then(restDoc => {
        // Compute new number of ratings
        var newNumRatings = restDoc.data('numRatings') + 1;

        // Compute new average rating
        var oldRatingTotal = restDoc.data('avgRating') * restDoc.data('numRatings');
        var newAvgRating = (oldRatingTotal + ratingVal) / newNumRatings;

        // Update restaurant info
        return transaction.update(restRef, {
          avgRating: newAvgRating,
          numRatings: newNumRatings
        });
      });
    });
});

If the function runs once, the aggregate data is increased as if one rating is added, but if it runs again on the same rating it will increase the aggregate data as if there were two ratings added.

Unless I'm misunderstanding the concept of idempotence, this seems to be a problem.

Does anyone have any ideas of how to increase / decrease aggregate data in Cloud Firestore via Cloud Functions in a way that is idempotent?

(And of course doesn't involve querying every single document the aggregate data is regarding)

Bonus points: Does anyone know if functions will still need to be idempotent after Cloud Firestore is out of beta?

saricden
  • 2,084
  • 4
  • 29
  • 40

1 Answers1

8

The Cloud Functions documentation gives some guidance on how to make retryable background functions idempotent. The bullet point you're most likely to be interested in here is:

Impose a transactional check outside the function, independent of the code. For example, persist state somewhere recording that a given event ID has already been processed.

The event parameter passed to your function has an eventId property on it that is unique, but will be the same when an even it retried. You should use this value to determine if an action taken by an event has already occurred, so you know to skip the action the second time, if necessary.

As for how exactly to check if an event ID has already been processed by your function, there's a lot of ways to do it, and that's up to you.

You can always opt out of making your function idempotent if you think it's simply not worthwhile, or it's OK to possibly have incorrect counts in some (probably rare) cases.

Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
  • I'm not using the counts for anything particularly important at the moment, so having slightly inaccurate counts wouldn't be the end of the world. However the eventid property sounds like something I can form a solution from, thank you very much! – saricden Mar 14 '18 at 18:17
  • 5
    This should be added to the docs where you show an example of a non-idempotent aggregation function (which many are probably following without knowing it has this flaw) https://firebase.google.com/docs/firestore/solutions/aggregation#solution_cloud_functions – cdock Sep 15 '18 at 02:28
  • We have been using a new doc with the `eventId` as the key successfully (if it exists, already processed). But that check seems to fail with batched writes to the same collection. The "does this doc exist" is very often returning `true` (b/c it _does_ in fact exist), but by all appearances it is 1st run of the function. – wtk Jan 18 '19 at 15:56
  • Sorry, I need to take back my comment above. Just found that we had 2 cloud funcs triggering from the same document writes, and both use the same "check event id document" logic. – wtk Jan 18 '19 at 17:25
  • How long does it take from one invocation to another? in multiple invocations? I am still doubting if two invocations are called at the same time it can still lead to multiple incorrect count in when incrementing! – Xenolion Jul 23 '20 at 08:29
  • 1
    @Xenolion There is no documented guarantee about how that happens. I think there is a backoff, but you'd have to measure that for yourself. – Doug Stevenson Jul 23 '20 at 16:28