0

I have a mongo collection containing docs such as this:

{
    "_id" : ObjectId("57697321c22d3917acd66513"),
    "parent" : "AlphaNumericID",
    "signature" : "AnotherAlphaNumericID",
    "price" : 1638,
    "url" : "http://www.thecompany.com/path/to/page1",
    "date" : ISODate("2016-06-21T17:02:20.352Z"),
    "valid" : true
}

What I am trying to do is to run one query that would group on signature filed, return min and max price AND corresponding url:

{
        "signature" : "AnotherAlphaNumericID",  
        "min_price" : 1504,
        "min_rent_listing" : "http://www.thecompany.com/path/to/page1",
        "max_price" : 1737,
        "max_price_listing" : "http://www.thecompany.com/path/to/page2",
}

Running a $group on $signature field to obtain $min and $max is straight forward but in order to get the actual urls I split the query into 2 with the first query returning a sorted list of docs using $signature with prices from min to max and then (in python code) taking the first and last element. This works fine but would be nice to have one query.

Thoughts?

p.s.

Also 'toyed' with running one query for min and one for max and 'zipping' the results.

zevij
  • 2,416
  • 1
  • 23
  • 32

1 Answers1

2

You can play a trick with help of $group and $project. Assuming dataset is

{ 
    "_id" : ObjectId("57db28dc705af235a826873a"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 1638.0, 
    "url" : "http://www.thecompany.com/path/to/page1", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}
{ 
    "_id" : ObjectId("57db28dc705af235a826873b"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 168.0, 
    "url" : "http://www.thecompany.com/path/to/page2", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}
{ 
    "_id" : ObjectId("57db28dc705af235a826873c"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 163.0, 
    "url" : "http://www.thecompany.com/path/to/page3", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}
{ 
    "_id" : ObjectId("57db28dc705af235a826873d"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 1680.0, 
    "url" : "http://www.thecompany.com/path/to/page4", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}

Try following query in shell

db.collection.aggregate([
   {$sort:{price:1}},
   {$group:{
       _id:"$signature", 
       _first:{$first:"$url"},
       _last:{$last:"$url"}, 
       _min:{$first:"$price"}, 
       _max:{$last:"$price"}}
   },
   {$project:{
     _id:0, 
     min:{
       url:"$_first", 
       price:"$_min"}, 
     max:{
       url:"$_last", 
       price:"$_max"}}
   }
])

Output will be with minimum/maximum price and corresponding url

{ 
    "min" : {
        "url" : "http://www.thecompany.com/path/to/page3", 
        "price" : 163.0
    }, 
    "max" : {
        "url" : "http://www.thecompany.com/path/to/page4", 
        "price" : 1680.0
    }
}

What I changed from original answer: _min:{$min:"$price"}, --> to use $first _max:{$max:"$price"}} --> to use $last

Reason: we go into the pipeline with an ascending sort on price. By default, first record is min and last record is max.

zevij
  • 2,416
  • 1
  • 23
  • 32
Saleem
  • 8,728
  • 2
  • 20
  • 34
  • question: is there a way to run the query so that it returns min, max and latest? In your answer, first/last = min/max as we sort on price:1. Could we take the result, and 'enrich' it with 'latest' or do I have to stick to two queries whereby the first query is the above and a separate query to sort by date:-1 and take first result? – zevij Sep 17 '16 at 18:45
  • Well, best way to find it is to give it a try. – Saleem Sep 17 '16 at 18:47
  • Guess not in one query cause I need to sort by price for min/max. For latest I have to sort by date. The sorts cannot be 'consecutive' but rather standalone... – zevij Sep 17 '16 at 21:18