0

Let me explain my issue which is unlikely, but possible to happen.

I have two collections, A, and B. B is "child" of A - so A has a field b_id. I want to read A, with it's B child in a consistent state. The thing I'm worried about is this:

A (in state 0) read
B changed (from state 0 to state 1)
B (in state 1) read

What I want is:

A (in state 0) read
B changed (from state 0 to state 1)
B (in state 0) read

To help imagining, once I start the reading "process", I want to have a consistent "snapshot" of the database, so even if a change happens to B in that few milliseconds of difference between the read of A, and B, I will read the B in the state it was when I started reading.

It's pretty hard to test, so it's more like a theoretical question. I was doing research about populating, aggregation, but it feels like even those methods work sequentially, so there's a chance it will fail in the above case, but I don't know. Which method (if exist) could provide the solution for the above problem?

Gergő Horváth
  • 3,195
  • 4
  • 28
  • 64

1 Answers1

0

Short answer is no.

Mongo does not keep historical values of data and once a document was updated there is no way to read its previous version unless you set up a delayed secondary.

There are transactions and snapshots but they aim different issues and won't help in your case.

You can minimise time between reads from 2 collections by using $lookup aggregation to query both collections at the same query. It does not guarantee the document in collection B won't change during the query but significantly reduces time between reads from 2 collections so there will be much less chances to read updated version.

If it is not good enough, e.g. if your whole business depends on this guarantee or if good people die because of such read inconsistency you will need to implement it on the application level.

There are 2 options of the top of my head.

Use modification timestamps. Ensure these timestamps are monotonically update on any data modification, e.g. with $currentDate. At the query time get the current timestamp before the fist query and add it to the filter of both queries so the documents that was modified after this moment won't match the filter. You will need to implement post-query validation logic to check if the empty response is because there is no such documents at all or because it was update, re-try the read etc.

The second option is to rely on change streams to keep "recent state" of the document in the change-stream client - a service that keeps history of updated documents from collection B for some time. In this set up you don't need to change anything in your document structure or queries, but make a validation request to this change stream service to confirm there were no changes and that your read is consistent.

The later is a bit more complex set up and you will need to take into account changestream time lag. The former is simpler and more reliable as long as all writes to the database properly update the modification timestamp.

Alex Blex
  • 34,704
  • 7
  • 48
  • 75