If I understand your question, you have data something like the following:
db.users.insert({_id: 100, likes: [
'pina coladas',
'long walks on the beach',
'getting caught in the rain'
]})
db.users.insert({_id: 101, likes: [
'cheese',
'bowling',
'pina coladas'
]})
db.users.insert({_id: 102, likes: [
'pina coladas',
'long walks on the beach'
]})
db.users.insert({_id: 103, likes: [
'getting caught in the rain',
'bowling'
]})
db.users.insert({_id: 104, likes: [
'pina coladas',
'long walks on the beach',
'getting caught in the rain'
]})
and you wish to compute for a given user how many matching features ('likes' in this example) they have with other users? The following aggregation pipeline will accomplish this:
user = 100
user_likes = db.users.findOne({_id: user}).likes
return_only = 2 // number of matches to return
db.users.aggregate([
{$unwind: '$likes'},
{$match: {
$and: [
{_id: {$ne: user}},
{likes: {$in: user_likes}}
]
}},
{$group: {_id: '$_id', common: {$sum: 1}}},
{$sort: {common: -1}},
{$limit: return_only}
])
Given the example input data above this will output the following result showing the top 2 matches:
{
"result" : [
{
"_id" : 104,
"common" : 3
},
{
"_id" : 102,
"common" : 2
}
],
"ok" : 1
}
Note that I assumed that you will want only the top so many matches, since there may be a very large number of users. The $sort step followed by the $limit step will accomplish this. If that is not the case then you can just omit the last two steps in the pipeline.
I hope this helps! Let me know if you have further questions.
Bruce