4

I am having documents with an attached sparse vector like this:

{
 "_id" : ObjectId
 "vec" : [ 
    {
        "dim" : 1,
        "weight" : 8
    }, 
    {
        "dim" : 3,
        "weight" : 3
    }
  ]
}

I am trying to get the normalised dot product with an input vector of the same format and all documents in the collection. I can accomplish it with this quite cumbersome aggregate query, but I am wondering if there is a more efficient method.

[
  {$unwind: "$vec"},
  {$project: {
    squareWeight: {multiply: ["$vec.weight","$vec.weight"]}, //for the norm
    dim: "$vec.dim",
    weight: "$vec.weight"
    inputVec: {$literal:[{dim:2,weight: 5},{dim:5, weight:2}]} //input vector
  }},
  {$project: {
    dim: 1,
    squareWeight: 1,
    scores: {
      $map: { //multiplying each input element with the vector weight
        input: "$inputVec"
        as: "input"
        in: {$cond: [
          {$eq: ["$$input.dim","$dim"]},
          {$multiply: ["$$input.weight", "$weight"]},
          0
        ]}  //in
      }  //map
    }  //scores
  }},  //project
  {$unwind: "$scores"},
  {$project: {
    scores :1,
    squareWeight: {
      $cond: [{$eq: ["scores,0"]},0,"$squareWeight"]] //to avoid multiple counting
    }
  }},
  {$group: {
    _id: "$_id",
    score: {$sum: "$scores"},
    squareSum: {$sum: "$squareWeight"}
  }}
]

I now can calculate the normalised result by taking score/(sqrt(squareSum) * ||inputVec||)

This feels not like the most efficient way so I am looking for improvements.

Thanks.

  • 1
    I think that's the way to do it. I agree it's not pretty but your use case is on the margins of the aggregation framework's area and several operators outside would have to be added to make the pipeline "pretty". – wdberkeley Feb 08 '15 at 00:53
  • You mean adding operators outside by inserting helper documents to the collection? Would this be more efficient from an performance point of view? The operation doesn't need to look "pretty", but I am wondering over the most performant way of doing it. Maybe with Map Reduce instead of the aggregation or a total other way on the same DB format. Thank you. – Hans Mündelein Feb 08 '15 at 07:50
  • No, I mean MongoDB engineering would need to add more operators like `$sum`, etc. – wdberkeley Feb 09 '15 at 15:57

0 Answers0