1

I'm linking two MongoDB collections, say A and B, by storing B's ObjectId in A (say A.bId). How do I find all instances of A, where bId no longer valid (the corresponding B doesn't exist)?

Based on answers and official documentation, it's clear that MongoDB doesn't enforce references. So I need an efficient way to find invalid A's in order to purge them.

A and B are large collections, so I'd like to avoid:

  • Loading all B.id's into memory and doing a $nin
  • Iterating through all A's and do B.findById(A.bId)
Community
  • 1
  • 1
phillee
  • 2,227
  • 2
  • 19
  • 28

2 Answers2

2

MongoDB does not do joins - period. There is no command which affects more than one collection. You will have to check for references to every single object individually.

When you want to avoid this in the future, you might consider to use embedding instead of referencing. In many cases where you would delete orphaned objects, you have a case of composition instead of aggregation. The children are an integral part of the parent and meaningless without. This case is often expressed best by making the children an array ob sub-objects in the parent.

Philipp
  • 67,764
  • 9
  • 118
  • 153
1

I would solve this by writing my application code so that any invalid documents in A are removed as soon as they become invalid. This is different than your idea of writing a script to find and remove all invalid documents in A at the same time.

The first step would be asking yourself "At what point(s) in my application can a document in the A collection become invalid?" Say the answer is "Whenever a document from the B collection is removed." Then I would code a function to run post-delete to remove any invalid documents in A. This can be done quickly if you set up the document structure and indexes correctly.

The bigger question to ask yourself is, if you're referencing data in one collection from another, and needing to JOIN that data frequently, perhaps your data models aren't structured in an efficient way in MongoDB. Or perhaps you should move completely over to a relational database system.

Travis
  • 12,001
  • 8
  • 39
  • 52