1

Say one document(Level 1) is supposed to have N(k) different kinds of items as sub-document arrays(Level 2), and these are stored in a separate collection(s?) due to lack of query optimisation for deep nested array sub docs. Now, we are supposed to fetch N(d1) level-1 docs, each has N(d2) number of sub docs (total count of level-2 docs for one level-1 doc is N(d2))

What are the pros and cons of having all those item kinds in a single collection vs a collection of each type. Taking into consideration the following points:

  • The storage engine used is WT
    • DB can read while other documents are being written to same collection. here.
    • The read-write lock is document level and not collection level as with previous cases. here.
  • Sparse Indexing is allowed, since the fields can differ from type to type
  • The deployment type is a sharded cluster
  • All query-able fields are indexed (not blindly though) => $or operator can use indexes
  • A point in favour of one approach automatically means it leads to degradation of performance in other approach.

How the query will work in case of Polymorphic collection:

  • All different types can be fetched in a single query(using $or) operation and then attached to their parent documents in the server app. We can also use aggregate $group to group one type of docs together. Thus,
    • decreasing the network induced query latency.
    • decreasing the number of connections the from the server.
  • Since, the sub-docs are attached back to their parent docs, it requires a for loop on the server side. Which has to loop over N(d1) docs at minimum - in case aggregation $group is used at query level - or N(d1) * N(d2) times if the sub-docs are not already grouped.

How the query will work in case each type has its own collection:

  • N(k) queries are made, as soon as a query returns we can attach the sub-doc to its parent doc.
  • We can, again, use aggregation $group and thereby limiting the server side loop to N(d1) iterations

In favour of a single collection:

  • Less number of connections used. ( consistent at 1 per query )
  • consistent latency. ( as an implication of earlier point )
  • ?

In favour of multiple:

  • An organisation structure we are more used to. ( subjective )
  • ?

How else could each case effect performance?

TarunG
  • 602
  • 5
  • 21
  • This question may be beyond the scope of Stack Overflow - you might find the expertise to answer it over at http://dba.stackexchange.com/ ? – Vince Bowdren Aug 20 '16 at 09:58
  • Thanks for the suggestion @VinceBowdren http://dba.stackexchange.com/questions/147388/performance-impact-of-having-polymorphic-collections-in-mongodb – TarunG Aug 20 '16 at 10:35

0 Answers0