Using weights for searching in mongoose

Question

So I've read through this but still am a little bit confused on how to go about this.

My model contains various fields that are Strings, Numbers and Boolean values.

$text seems like it can only take in strings.

What if I wanted to do a search like:

model.find({petsAllowed:true, rooms:4, house:"townhouse"}).sort()

So have it search for all the different entries in mongodb that match with what I'm inputting in AND sort it based on how close the entry is to the inputted fields.

I know mongoose supports this so I don't want to rely on a plugin.

Here's the result I want:

[ 
Document 1 (most closely matched with the input): 
    {petsAllowed:true, rooms:4, house:"townhouse"},
Document 2: {petsAllowed:false, rooms:4, house:"townhouse"},
Document 3: {petsAllowed:true, rooms:5, house:"townhouse"},
Document 4: {petsAllowed:false, rooms:3, house:"townhouse"}
]

Well `$text` is a fairly apt name for a "text search" operator since that is what it does. Really though, what would you expect from "nearest" to either Boolean or Numeric values? They are pretty much absolute. If you want something "fuzzy" from them then use "range" operators like `$gte` and `$lte` and possibly omit the boolean unless you actually **only** want to return results that meet those conditions. — Neil Lunn, Mar 06 '15 at 02:04
Well given the input fields, they should all be or'd together so if a document has the field petsAllowed as true then it should be included in the results. Then all those results should be sorted based on which result matches closes with what was inputted. — user1883614, Mar 06 '15 at 02:10
So you want as an example: `"petsAllowed": true` to be at the "top" of the results, but also include the `false` values below them. Sounds like what you are asking? More or less add to a weight score for each of those conditions that is an exact match and then sort the results with the highest score first. So put your thinking cap on. How would you calculate values in a MongoDB query and then use the calculated result to sort on? There is an answer, but it's not really efficient to do it that way. — Neil Lunn, Mar 06 '15 at 02:20
Well false would only be included if that document had 4 rooms or the house type was a townhouse. Yes that's exactly what I'm asking. I know that mongoose has a sort() function but again, I'm not exactly sure how to go about this (which is why I asked). I'm sorry if that's not what you were looking for but I'm a bit lost on how to search using mongoose. — user1883614, Mar 06 '15 at 02:26
Part of my initial point was your question is far "too broad" in the possible interpretations of what you really want. Rather than chatting back and forth with me, try editing your question to explain exactly the results you expect to achieve given a set of conditions. Issuing in the form you currently has produces "exact" matches and not "fuzzy" searches. You can code fuzzy logic, but it means defining the conditions first. Which you presently have not. Update your question with everything relevant. — Neil Lunn, Mar 06 '15 at 02:30
Added the results. I guess I would have to decide which fields are more important than other fields to get a better weighted result. — user1883614, Mar 06 '15 at 02:34

score 1 · Accepted Answer · answered Mar 06 '15 at 03:40

In order to "weight" a response then the basic principle is you have to determine which parts of a result are more important to the search you are performing and essentially provide an appropriate score by order of importance of the result due to your rules.

This is really a MongoDB thing and not an externally coded thing as you need to analyse the results on the server, especially when you are considering something like "paging" the weighted results when there is a lot of them. To do this on the server you need the .aggregate() method.

Working through this I already had my own data sample while waiting for your input, but it still serves as example. Considering this initial sample.

{ "petsAllowed" : true,  "rooms" : 5, "type" : "townhouse" }
{ "petsAllowed" : false, "rooms" : 4, "type" : "house"     }
{ "petsAllowed" : true,  "rooms" : 4, "type" : "townhouse" }
{ "petsAllowed" : false, "rooms" : 4, "type" : "townhouse" }
{ "petsAllowed" : true,  "rooms" : 2, "type" : "townhouse" }
{ "petsAllowed" : true,  "rooms" : 3, "type" : "townhouse" }
{ "petsAllowed" : true,  "rooms" : 4, "type" : "house"     }

So that also includes a "type" where we are also going to be "fuzzy" in the match and not just determine "exact" matches. Using the aggregation pipeline and setting up the logic from your inputs is basically like this:

 var roomsWanted = 4,
     exact = "townhouse",
     types = [];

 // Some logic to get the "fuzzy" values
 var fuzzy = [/house/]

 // Combine exact and fuzzy    
 types.push(exact);
 fuzzy.forEach(function(fuzz) {
     types.push(fuzz);
 });

 // Perform the query
 db.houses.aggregate([
     // Match items you want and exclude others
     { "$match": { 
         "type": { "$in": types }, 
         "$or": [
             { "rooms": { "$gte": roomsWanted } },
             { "rooms": roomsWanted - 1 }
         ]
     }},

     // Calculate a score
     { "$project": {
         "petsAllowed": 1,
         "rooms": 1,
         "type": 1,
         "score": {
             "$add": [
                 // Exact match is higher than the fuzzy ones
                 // Fuzzy ones score lower than other possible matches
                 { "$cond": [
                     { "$eq": [ "$type", "townhouse" ] },
                     20,
                     2
                 ]},
                 // When petsAllowed is true you want a weight
                 { "$cond": [
                     "$petsAllowed",
                     10,
                     0
                 ]},
                 // Score depending on the roomsWanted
                 { "$cond": [
                     { "$eq": [ "$rooms", roomsWanted ] },
                     5,
                     { "$cond": [
                         { "$gt": [ "$rooms", roomsWanted ] },
                         4,
                         { "$cond": [
                             { "$eq": [ "$rooms", roomsWanted - 1 ] },
                             3,
                             0
                         ]}
                     ]}
                 ]}
             ]
         }
     }},
     { "$sort": { "score": -1 } },
 ])

The results you get are then sorted by the generated "score" like so:

{ "petsAllowed" : true,  "rooms" : 4, "type" : "townhouse", "score" : 35 }
{ "petsAllowed" : true,  "rooms" : 5, "type" : "townhouse", "score" : 34 }
{ "petsAllowed" : true,  "rooms" : 3, "type" : "townhouse", "score" : 33 }
{ "petsAllowed" : false, "rooms" : 4, "type" : "townhouse", "score" : 25 }
{ "petsAllowed" : true,  "rooms" : 4, "type" : "house",     "score" : 17 }
{ "petsAllowed" : false, "rooms" : 4, "type" : "house",     "score" : 7  }

Breaking that down into what is happening here, the first thing is my own decision that I possibly want anything that contains "house" in the "type" as well as any "exact matches" for the type that was selected. That's arbitrary logic to determine that, but the point is that we are going to consider both in this example.

Of course the search will want to filter out anything that you really don't want, so there is a $match pipeline stage to do this. The $in operator is used to match "type" to either the exact "townhouse" term or to a possible regular expression match of /house/. That's because I want it too, and your mileage may vary on what it is you want to really do.

Also there is a condition to look for the number of rooms. Again a arbitrary decision here is that I will consider both anything with four rooms or greater, hence the **$gte** condition. I also want to consider things that have one less room than was asked for. Arbitrary logic again, but just to demonstrate the point of what you do when you want this.

After $match has done it's "filtering", you move the results to the $project stage. The main point here is that you want a calculated "score" value, but you also must specify all of the fields you want to return when using this pipeline stage.

Here is where you have made some choices over which "weight" to apply to conditions. The $add operator will "sum" results that are given as it's arguments, which are in turn produced by the $cond or "conditional" operator.

This is a "ternary" operator, in that it evaluates a logical "if" condition as the first argument, then either returns the true second argument or the false third argument. Like any ternary, when you want to test different conditions to "nest" the operators within the false argument in order to "flow through" them.

Once a "score" has been determined you $sort the results in the order of the largest "score" first.

Implementing paging can be done in either a traditional form by adding $skip and $limit pipeline stages at the end of the pipeline, or by more involved "forward paging" by keeping the last value(s) seen and excluding those from the results looking for a "score" $lte the last "score" that was seen. That's another topic in itself, but it all depends on what sort of paging concept suits your application the best.

Of course for some of the logic like "petsAllowed", you only want those conditions to calculate a weight when it's actually valid to the selection criteria you want. Part of the beauty of the aggregation pipeline syntax and indeed all MongoDB query syntax is that whatever the language implementation it is basically just a representation of a "data structure". So you can just "build" the pipeline stages as required from your inputs as you would any data structure in code.

Those are the principles, but of course everything comes at a cost and calculating these weightings "on the fly" is not just a simple query where the values can be looked up in an index.

Thank you so much for this! It definitely clears up a lot of things. — user1883614, Mar 06 '15 at 03:44
@user1883614 I had some spare lunch time. Naturally your weighting rules will vary, but at least this should be a basic template to follow in implementing your actual logic. — Neil Lunn, Mar 06 '15 at 03:49
The only issue is that there are about 6 fields max but not all will be used every time I call search so I'll see how to go about that. — user1883614, Mar 06 '15 at 04:02
@user1883614 This is exactly what I said about "building the data structure" there are examples of this on stackoverflow ( I know since I've written some answers that do that ) or you could always ask another question about how to "build" your conditions based on your input if you get stuck. — Neil Lunn, Mar 06 '15 at 04:06
I'll see if I can get something to work with all of this or I will ask another question soon. — user1883614, Mar 06 '15 at 04:12
Alright, how would you build a data structure based on the inputs? I thought about creating an array of objects but nodejs gives me errors. — user1883614, Mar 08 '15 at 02:19

Using weights for searching in mongoose

1 Answers1