I'm building an application that could be likened to a dating application.
I've got some documents with a structure like this:
$ db.profiles.find().pretty()
[
{
"_id": 1,
"firstName": "John",
"lastName": "Smith",
"fieldValues": [
"favouriteColour|red",
"food|pizza",
"food|chinese"
]
},
{
"_id": 2,
"firstName": "Sarah",
"lastName": "Jane",
"fieldValues": [
"favouriteColour|blue",
"food|pizza",
"food|mexican",
"pets|yes"
]
},
{
"_id": 3,
"firstName": "Rachel",
"lastName": "Jones",
"fieldValues": [
"food|pizza"
]
}
]
What I'm trying to so is identify profiles that match each other on one or more fieldValues
.
So, in the example above, my ideal result would look something like:
<some query>
result:
[
{
"_id": "507f1f77bcf86cd799439011",
"dateCreated": "2013-12-01",
"profiles": [
{
"_id": 1,
"firstName": "John",
"lastName": "Smith",
"fieldValues": [
"favouriteColour|red",
"food|pizza",
"food|chinese"
]
},
{
"_id": 2,
"firstName": "Sarah",
"lastName": "Jane",
"fieldValues": [
"favouriteColour|blue",
"food|pizza",
"food|mexican",
"pets|yes"
]
},
]
},
{
"_id": "356g1dgk5cf86cd737858595",
"dateCreated": "2013-12-02",
"profiles": [
{
"_id": 1,
"firstName": "John",
"lastName": "Smith",
"fieldValues": [
"favouriteColour|red",
"food|pizza",
"food|chinese"
]
},
{
"_id": 3,
"firstName": "Rachel",
"lastName": "Jones",
"fieldValues": [
"food|pizza"
]
}
]
}
]
I've thought about doing this either as a map reduce, or with the aggregation framework.
Either way, the 'result' would be persisted to a collection (as per the 'results' above)
My question is which of the two would be more suited? And where would I start to implement this?
Edit
In a nutshell, the model can't easily be changed.
This isn't like a 'profile' in the traditional sense.
What I'm basically looking to do (in psuedo code) is along the lines of:
foreach profile in db.profiles.find()
foreach otherProfile in db.profiles.find("_id": {$ne: profile._id})
if profile.fieldValues matches any otherProfie.fieldValues
//it's a match!
Obviously that kind of operation is very very slow!
It may also be worth mentioning that this data is never displayed, it's literally just a string value that's used for 'matching'