Say one document(Level 1) is supposed to have N(k)
different kinds of items as sub-document arrays(Level 2), and these are stored in a separate collection(s?) due to lack of query optimisation for deep nested array sub docs.
Now, we are supposed to fetch N(d1)
level-1 docs, each has N(d2)
number of sub docs (total count of level-2 docs for one level-1 doc is N(d2)
)
What are the pros and cons of having all those item kinds in a single collection vs a collection of each type. Taking into consideration the following points:
- The storage engine used is WT
- Sparse Indexing is allowed, since the fields can differ from type to type
- The deployment type is a sharded cluster
- All query-able fields are indexed (not blindly though) =>
$or
operator can use indexes - A point in favour of one approach automatically means it leads to degradation of performance in other approach.
How the query will work in case of Polymorphic collection:
- All different types can be fetched in a single query(using
$or
) operation and then attached to their parent documents in the server app. We can also use aggregate$group
to group one type of docs together. Thus,- decreasing the network induced query latency.
- decreasing the number of connections the from the server.
- Since, the sub-docs are attached back to their parent docs, it requires a for loop on the server side. Which has to loop over
N(d1)
docs at minimum - in case aggregation$group
is used at query level - orN(d1) * N(d2)
times if the sub-docs are not already grouped.
How the query will work in case each type has its own collection:
N(k)
queries are made, as soon as a query returns we can attach the sub-doc to its parent doc.- We can, again, use aggregation
$group
and thereby limiting the server side loop toN(d1)
iterations
In favour of a single collection:
- Less number of connections used. ( consistent at 1 per query )
- consistent latency. ( as an implication of earlier point )
- ?
In favour of multiple:
- An organisation structure we are more used to. ( subjective )
- ?
How else could each case effect performance?