0

I am building a web based system for my organization, using Mongo DB, I have gone through the document provided by mongo db and came to the following conclusion:

find: Cannot pull data from sub array.
group: Cannot work in sharded environment.
aggregate:Best for sub arrays, but has performance issue when data set is large.
Map Reduce : Too risky to write map and reduce function.

So,if someone can help me out with the best approach to work with sub array document, in production environment having sharded cluster.

Example:

{"testdata":{"studdet":[{"id","name":"xxxx","marks",80}.....]}}

now my "studdet" is a huge collection of more than 1000, rows for each document,

So suppose my query is:

"Find all the "name" from "studdet" where marks is greater than 80"

its definitely going to be an aggregate query, so is it feasible to go with aggregate in this case because ,"find" cannot do this and "group" will not work in sharded environment, so if I go with aggregate what will be the performance impact, i need to call this query most of the time.

Phalguni Mukherjee
  • 623
  • 3
  • 11
  • 29
  • I think you should use the mix of both find and aggregate depending upon the use case. – Anuj Aneja Sep 14 '13 at 07:59
  • Maybe you should structure your documents differently or you just don't know which query command to use and how. It's hard to answer your question unless you give a specific example of some documents and what information you want to extract from them. – Philipp Sep 14 '13 at 16:20
  • @Philip its not about any specific example, and I very well know which query to be used where, but as the document tells that unwind is a problem where the array content is huge, so not able to know how feasible it will be executing the aggregate, so my question is not about how to use aggregate and get result, but how feasible it will be in production environment if I go with it, find is reliable but it cannot pull data from sub array – Phalguni Mukherjee Sep 14 '13 at 16:59
  • @Philip I modified the question and added an example. – Phalguni Mukherjee Sep 15 '13 at 05:49
  • It seems to me like `testdata` should be a database and `studdet` should be a collection. Did you consider to make the `studdet`'s stand-alone documents in an own collection instead of putting them in an array in a document? – Philipp Sep 15 '13 at 14:06
  • @Philipp testdata is a field, which contains an object with field "studdet" having an array of objects. – Phalguni Mukherjee Sep 15 '13 at 16:05
  • @PhalguniMukherjee I see that, but I am asking WHY you would do something like that, because it makes querying it unnecessarily hard. – Philipp Sep 15 '13 at 16:06
  • @Philipp , because its the only one scenario where I need to query like this, other all the places it require to fetch both the information together, also while creating new document both are created at the same time, so if i take them to different collection atomic operation will be not granted. – Phalguni Mukherjee Sep 15 '13 at 16:10

1 Answers1

0

Please have a look at: http://docs.mongodb.org/manual/core/data-modeling/ and http://docs.mongodb.org/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/#data-modeling-example-one-to-many

These documents describe the decisions in creating a good document schema in MongoDB. That is one of the hardest things to do in MongoDB, and one of the most important. It will affect your performance etc. In your case running a database that has a student collection with an array of grades looks to be the best bet. {_id:, …., grades:[{type:”test”, grade:80},….]} In general, and, given your sample data set, the aggregation framework is the best choice. The aggregation framework is faster then map reduce in most cases (certainly in execution speed, it is C++ vs javascript for map reduce).
If your data's working set becomes so large you have to shard then aggregation, and everything else, will be slower. Not, however, slower then putting everything on a single machine that has a lot of page faults. Generally you need a working set larger then the RAM available on a modern computer for sharding to be the correct way to go such that you can keep everything in RAM. (At this point a commercial support contract for Mongo for assistance is going to be a less then the cost of hardware, and that include extensive help with schema design.)

If you need anything else please don’t hesitate to ask.

Best, Charlie

Charlie Page
  • 251
  • 1
  • 3