0

Why does MongoDB not support queries of properties of embedded documents that are stored using hashes?

For example say you have a collection called "invoices" which was created like this:

db.invoices.insert(
    [
        {
            productsBySku: {
                12432: {
                    price: 49.99,
                    qty_in_stock: 4
                },
                54352: {
                    price: 29.99,
                    qty_in_stock: 5
                }
            }
        },
        {
            productsBySku: {
                42432: {
                    price: 69.99,
                    qty_in_stock: 0
                },
                53352: {
                    price: 19.99,
                    qty_in_stock: 5
                }
            }
        }
    ]
);

With such a structure, MongoDB queries with $elemMatch, dot syntax, or the positional operator ($) fail to access any of the properties of each productsBySku member.

For example you can't do any of these:

db.invoices.find({"productsBySku.qty_in_stock":0});

db.invoices.find({"productsBySku.$.qty_in_stock":0});

db.invoices.find({"productsBySku.qty_in_stock":{$elemMatch:{$eq:0}}});

db.invoices.find({"productsBySku.$.qty_in_stock":{$elemMatch:{$eq:0}}});

To find out-of-stock products therefore you have to resort to using a $where query like:

db.invoices.find({
    $where: function () {
        for (var i in this.productsBySku)
            if (!this.productsBySku[i].qty_in_stock)
                return this;
    }
});

On a technical level... why did they design MongoDB with this very severe limitation on queries? Surely there must be some kind of technical reason for this seeming major flaw. Is this inability to deal with an a list of objects as an array, ignoring the keys, just a limitation of JavaScript as a language? Or was this the result of some architectural decision within MongoDB?

Just curious.

CommaToast
  • 11,370
  • 7
  • 54
  • 69
  • I cant insert your sample into my collection. I think it need some extra { after productsBySku – Disposer Nov 17 '14 at 08:29
  • Tbh I really don't know, you could do queries on object propeties quite easily iterating each level and since all of the methods that MongoDB supports could also be used on objects it does seem kind of counter-intuitive that objects are not supported in querying. Not only that but removing by object indexes also is faster instead of having to search every array for the index – Sammaye Nov 17 '14 at 10:34

2 Answers2

3

As a rule of thumb: Usually, these problems aren't technical ones, but problems with data modeling. I have yet to find a use case where it makes sense to have keys hold semantic value.

If you had something like

'products':[
     {sku:12432,price:49.99,qty_in_stock:4},
     {sku:54352,price:29.99,qty_in_stock:5}
 ]

It would make a lot more sense.

But: you are modelling invoices. An invoice should – for many reasons – reflect a status at a certain point in time. The ever changing stock rarely belongs to an invoice. So here is how I would model the data for items and invoices

{
  '_id':'12432',
  'name':'SuperFoo',
  'description':'Without SuperFoo, you can't bar or baz!',
  'current_price':49.99
 }

Same with the other items.

Now, the invoice would look quite simple:

{ _id:"Invoice2",
  customerId:"987654"
  date:ISODate("2014-07-07T12:42:00Z"),
  delivery_address:"Foo Blvd 42, Appt 42, 424242 Bar, BAZ"
  items:
    [{id:'12432', qty: 2, price: 49.99},
     {id:'54352', qty: 1, price: 29.99}
    ]
}

Now the invoice would hold things that may only be valid at a given point in time (prices and delivery address may change) and both your stock and the invoices are queried easily:

// How many items of 12432 are in stock?
db.products.find({_id:'12432'},{qty_in_stock:1})

// How many items of 12432 were sold during July and what was the average price?
db.invoices.aggregate([
  {$unwind:"$items"},
  {
    $match:{
      "items.id":"12432",
      "date":{
         $gt:ISODate("2014-07-01T00:00:00Z"),
         $lt:ISODate("2014-08-01T00:00:00Z")
      }
     }
   },
   {$group : { _id:"$items.id", count: { $sum:"$items.qty" }, avg:{$avg:"$items.price"} } }
])

// How many items of each product sold did I sell yesterday?
db.invoices.aggregate([
  {$match:{ date:{$gte:ISODate("2014-11-16T00:00:00Z"),$lt:ISODate("2014-11-17T00:00:00Z")}}},
  {$unwind:"$items"},
  {$group: { _id:"$items.id",count:{$sum:"$qty"}}}
])

Combined with the query on how many items of each product you have in stock, you can find out wether you have to order something (you have to do that calculation in your code, there is no easy way to do this in MongoDB).

You see, with a "small" change, you get a lot of questions answered.

And that's basically how it works. With relational data, you model your data so that the entities are reflected properly and then you ask

How do I get my answers out of this data?

In NoSQL in general and especially with MongoDB you first ask

Which questions do I need to get answered?

and model your data accordingly. A subtle, but important difference.

Markus W Mahlberg
  • 19,711
  • 6
  • 65
  • 89
  • 1
    +1, I couldn't agree more, especially with the "keys shouldn't hold semantic value" paradigm and the general approach to data modelling in NoSQL. Just because you don't need to declare a schema upfront doesn't mean you don't have to think about indexing, query patterns and selectivity... – mnemosyn Nov 17 '14 at 14:00
  • @mnemosyn it is of course good to remember that most aggregation and query functions in MongoDB have semantic key names wthin objects, it is clearly not a pattern that applies internally to MongoDB itself it seems – Sammaye Nov 20 '14 at 11:22
  • How can you say that you've never come across a situation where semantically useful keys are nice to have? Consider the following: `$qty = $order[$marketOrderItemID][$sku][$warehouse]['qty'];` Knowing the values of the various keys from an API XML response and loading the order object from MongoDB by its ID, I can then get at a very specific nugget of information without using any foreach loops or other expensive operations. I'm not sure why meaningfulness is considered evil; it seems that such useful keys ought to present use-cases 100% of the time vs. useless, meaningless ones. – CommaToast Nov 21 '14 at 15:01
  • Obviously the data structure I just mentioned would not work even if MongoDB dot-syntax/positional operator was object-oriented instead of just array-oriented, but I would be happy with being able to just use one layer, to be honest. I guess my point is just this: when dealing with sets of objects where the order they are in does not make a lick of difference, using arbitrary sequences to order them seems logically pointless, except for the fact that databases like MongoDB force you to do so by design limitations, which are not even technical limitations, if what you're saying is true. – CommaToast Nov 21 '14 at 15:06
  • As for the design of the data model, if the data I was modelling was as simple as what you've presented above, it would be fine, but what I have to keep track of is much more complicated than simple invoices. Invoices are involved but they are like the leaf on the edge of the furthest branch of an ever-changing tree. I think the answer for me is to build constructor classes for my objects that makes the keys useful and work within MongoDBs limitations when they go into the datastore. +1 for the answer, though I don't agree philosophically your points are well-put and useful. – CommaToast Nov 21 '14 at 15:12
1

If I am honest I am not sure, you would have to ask MongoDB Inc. (10gen) themselves. I will attempt to explain some of my reasoning.

I have searched on Google a little and nothing seems to appear: https://www.google.co.uk/search?q=mognodb+jira+support+querying+objects&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a&channel=fflb&gfe_rd=cr&ei=as9pVOW3OMyq8wfhtYCgCw#rls=org.mozilla:en-GB:official&channel=fflb&q=mongodb+jira+querying+objects

It is quick to see how using objectual propeties for keys could be advantageous, for example: remove queries would not have to search every object and its properties within the array but instead just find the single object property in the parent object and unset it. Essentially it would be the difference of:

[
    {id:1, d:3, e:54},
    {id:3, t:6, b:56}
]

and:

{
    1: [d:3, e: 54],
    3: [t:6, b:56]
}

with the latter, obviously, being a lot quicker to delete an id of 3.

Not only that but all array operations that MongoDB introduces, from $elemMatch to $unwind would work wth objects as well, I mean how is unwinding:

[
    {id:5, d:4}
]

much different to unwinding:

{
    5: {d:4}
}

?

So, if I am honest, I cannot answer your question. There is no defense on Google as to their decision and there is no extensive talk from what I can find.

In fact I went as far as to search up on this a couple of times, including: https://www.google.co.uk/search?q=array+operations+that+do+not+work+on+objects&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a&channel=fflb&gfe_rd=cr&ei=DtNpVLrwDsPo7AaH4oCoDw and I found results that went as far as underscore.js who actually comply their array functions to all objects as well.

The only real reason, I can think of, is standardisation. Instead of catering to all minority groups etc on how subdocuments may work they just cater to a single minority turned majority by their choice.

It is one of the points about MongoDB which does confuse me even now, since there are many times within my own programming where it seems advantageous for speed and power to actually use objects instead of arrays.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • 1
    I kind of agree with you. But every single time I had to get past that limitation, I got a data model which made much more sense than the original one. – Markus W Mahlberg Nov 17 '14 at 11:15
  • @Sammaye thank you, you have basically answered my question. Isn't it ironic that although the point of the NoSQL movement was to move us away from "spreadsheet-like" structures, we are still forcing numerical ordering upon unordered collections of data? Fortunately MongoDB is an open-source project, so I guess if I really care that much about this, I could try to make a pull request, and probably answer my own question in the process :P – CommaToast Nov 21 '14 at 15:20