1

We need to cache records for a service with a terrible API.

This service provides us with API to query for data about our employees, but does not inform us whether employees are new or have been updated. Nor can we filter our queries to them for this information.

Our proposed solution to the problems this creates for us is to periodically (e.g. every 15 minutes) query all our employee data and upsert it into a Mongo database. Then, when we write to the MongoDb, we would like to include an additional property which indicates whether the record is new or whether the record has any changes since the last time it was upserted (obviously not including the field we are using for the timestamp).

The idea is, instead of querying the source directly, which we can't filter by such timestamps, we would instead query our cache which would include said timestamp and use it for a filter.

(Ideally, we'd like to write this in C# using the MongoDb driver, but more important right now is whether we can do this in an upsert call or whether we'd need to load all the records into memory, do comparisons, and then add the timestamps before upserting them....)

Brian Kessler
  • 2,187
  • 6
  • 28
  • 58

1 Answers1

1

There might be a way of doing that, but how efficient that is, still needs to be seen. The update command in MongoDB can take an aggregation pipeline to perform an update operation. We can use the $addFields stage of MongoDB to add a new field denoting the update status, and we can use $function to compute its value. A short example is:

db.collection.update({
  key: 1
},
[
  {
    "$addFields": {
      changed: {
        "$function": {
          lang: "js",
          "args": [
            "$$ROOT",
            {
              "key": 1,
              data: "somedata"
            }
          ],
          "body": "function(originalDoc, newDoc) { return JSON.stringify(originalDoc) !== JSON.stringify(newDoc) }"
        }
      }
    }
  }
],
{
  upsert: true
})

Here's the playground link.

Some points to consider here, are:

  1. If the order of fields in the old and new versions of the doc is not the same then JSON.stringify will fail.

  2. The function specified in $function will run on the server-side, so ideally it needs to be lightweight. If there is a large number of users, that will get upserted, then it may or may not act as a bottleneck.

Charchit Kapoor
  • 8,934
  • 2
  • 8
  • 24
  • 1
    Looks very promising. Not so worried about other users since nobody else will be using this database. Nor terribly concerned about performance.... I'm a little concerned about order of fields, but I think this should generally be stable. I'll leave this open for a bit to see if anyone has a more efficient way, but this looks very promising. Cheers! – Brian Kessler Aug 09 '22 at 12:42
  • 1
    Cool, also if number of fields in the user model are less, we can directly compare them if the order is a problem – Charchit Kapoor Aug 09 '22 at 14:56
  • 1
    You will also need to remove the timestamp field from `JSON.stringify` – Charchit Kapoor Aug 09 '22 at 15:23
  • not sure what you mean by "less" in this context, but I don't think that will be the case. Actually, the model we receive is rather a mess of deeply nested objects which will even include a lot of data we can't control and don't care about. And we wouldn't want the solution to break if/when they decide we should receive additional fields.... – Brian Kessler Aug 09 '22 at 15:25
  • good point about the timestamp.... I think that will probably be easier to do before the stringify, so we can just drop it as a property rather than try to find a string to remove.... – Brian Kessler Aug 09 '22 at 15:26
  • 1
    If you can guarantee the order will be same each time, then the solution will work – Charchit Kapoor Aug 09 '22 at 15:27
  • 1
    Yess, before stringifying, we can delete it., like `delete originalDoc.timestamp` – Charchit Kapoor Aug 09 '22 at 15:28
  • 1
    I think what we receive from them should be "stable enough"... if not, at a cost of efficiency, we could regenerate the Json objects, alphabetizing the keys or some such.... – Brian Kessler Aug 09 '22 at 15:31
  • 1
    Yeah yeah, which ever way is simpler – Charchit Kapoor Aug 09 '22 at 15:33