I am having documents with an attached sparse vector like this:
{
"_id" : ObjectId
"vec" : [
{
"dim" : 1,
"weight" : 8
},
{
"dim" : 3,
"weight" : 3
}
]
}
I am trying to get the normalised dot product with an input vector of the same format and all documents in the collection. I can accomplish it with this quite cumbersome aggregate query, but I am wondering if there is a more efficient method.
[
{$unwind: "$vec"},
{$project: {
squareWeight: {multiply: ["$vec.weight","$vec.weight"]}, //for the norm
dim: "$vec.dim",
weight: "$vec.weight"
inputVec: {$literal:[{dim:2,weight: 5},{dim:5, weight:2}]} //input vector
}},
{$project: {
dim: 1,
squareWeight: 1,
scores: {
$map: { //multiplying each input element with the vector weight
input: "$inputVec"
as: "input"
in: {$cond: [
{$eq: ["$$input.dim","$dim"]},
{$multiply: ["$$input.weight", "$weight"]},
0
]} //in
} //map
} //scores
}}, //project
{$unwind: "$scores"},
{$project: {
scores :1,
squareWeight: {
$cond: [{$eq: ["scores,0"]},0,"$squareWeight"]] //to avoid multiple counting
}
}},
{$group: {
_id: "$_id",
score: {$sum: "$scores"},
squareSum: {$sum: "$squareWeight"}
}}
]
I now can calculate the normalised result by taking score/(sqrt(squareSum) * ||inputVec||)
This feels not like the most efficient way so I am looking for improvements.
Thanks.