3

I have this query:

db.collection.aggregate([
  {
    $group: {
      _id: "$PATIENT_ID", 
      MASS: { $max: "$max_field" }
    }
  }
]); 

The result of this query is:

{ "_id" : "TCGA-A2-A0YD", "mass" : 0.994683301742671 }
{ "_id" : "TCGA-A2-A3XX", "mass" : 0.993455527786917 }

I have this second query:

db.collection.aggregate([
  {
    $group: {
      _id: "TCGA-A2-A3XX",
      MASS: { $max: "$max_field" }
    }
  }
]);

The result of this second query is:

{ "_id" : "TCGA-A2-A3XX", "mass" : 0.994683301742671 }

Why? I expected another value for this id TCGA-A2-A3XX

DAXaholic
  • 33,312
  • 6
  • 76
  • 74
mydb
  • 45
  • 1
  • 9

1 Answers1

3

Your problem is that you are grouping all documents by the constant "TCGA-A2-A3XX" and therefore you get a single group with that ID and calculate the maximum of the field from all documents.
I guess you want to add a $match stage to your pipeline to filter for your specific patient.

db.collection.aggregate([
  { 
    $match: {
      PATIENT_ID: "TCGA-A2-A3XX"
    }
  },
  {
    $group: {
      _id: "$PATIENT_ID",
      MASS: { $max: "$max_field" }
    }
  }
]);
DAXaholic
  • 33,312
  • 6
  • 76
  • 74
  • Thanks for the answer. I don't know why in the first case I obtain : TCGA-A2-A3XX, 0.993455527786917 and in the second case I obtain TCGA-A2-A3XX 0.994683301742671 – mydb Jun 28 '16 at 17:24
  • 2
    Because in the first case you use "$PATIENT_ID" which refers to the documents' field "PATIENT_ID" and so you group by the values of that field. In the other case you group by the constant which will always result in a single group. – DAXaholic Jun 28 '16 at 17:26
  • Ok, but TCGA-A2-A3XX is one value of PATIENT_ID field – mydb Jun 28 '16 at 17:27
  • 1
    Maybe it helps to understand it if you try $group: {_id : "foobar" ... -> Same result – DAXaholic Jun 28 '16 at 17:28
  • 2
    You just tell the aggregation pipeline to group all documents by the constant and because it is a constant (not a path to a document field), there is built a single group with all documents and for this group with all docs (including the one with 0.994683301742671) you calculate the maximum – DAXaholic Jun 28 '16 at 17:30
  • ok, I understand thanks. so for insert where clause? WHERE PATIENT_ID = TCGA-A2-A3XX ? – mydb Jun 28 '16 at 17:31
  • 1
    I updated my answer. I can't test it for you 100% bc I only can guess what the documents look like but I hope it works - otherwise please share some sample documents – DAXaholic Jun 28 '16 at 17:44