Array intersection in MongoDB

Question

Ok there are a couple of things going on here..I have two collections: test and test1. The documents in both collections have an array field (tags and tags1, respectively) that contains some tags. I need to find the intersection of these tags and also fetch the whole document from collection test1 if even a single tag matches.

> db.test.find();
{
    "_id" : ObjectId("5166c19b32d001b79b32c72a"),
    "tags" : [
            "a",
            "b",
            "c"
    ]
}          
> db.test1.find();
{
    "_id" : ObjectId("5166c1c532d001b79b32c72b"),
    "tags1" : [
            "a",
            "b",
            "x",
            "y"
    ]
}
> db.test.find().forEach(function(doc){db.test1.find({tags1:{$in:doc.tags}})});

Surprisingly this doesn't return anything. However when I try it with a single document, it works:

> var doc = db.test.findOne();
> db.test1.find({tags1:{$in:doc.tags}});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a", "b", "x", "y" ] }

But this is part of what I need. I need intersection as well. So I tried this:

> db.test1.find({tags1:{$in:doc.tags}},{"tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a" ] }

But it returned just "a" whereas "a" and "b" both were in tags1. Does positional operator return just the first match? Also, using $in won't exactly give me an intersection..How can I get an intersection (should return "a" and "b") irrespective of which array is compared against the other.

Now say there's an operator that can do this..

> db.test1.find({tags1:{$intersection:doc.tags}},{"tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1" : [ "a", "b" ] }

My requirement is, I need the entire tags1 array PLUS this intersection, in the same query like this:

> db.test1.find({tags1:{$intersection:doc.tags}},{"tags1":1, "tags1.$":1});
{ "_id" : ObjectId("5166c1c532d001b79b32c72b"), "tags1": [ "a", "b", "x", "y" ],
"tags1" : [ "a", "b" ] }

But this is an invalid json. Is renaming key possible, or this is possible only through aggregation framework (and across different collections?)? I tried the above query with $in. But it behaved as if it totally ignored "tags:1" projection.

PS: I am going to have at least 10k docs in test1 and very few (<10) in test. And this query is in real-time, so I want to avoid mapreduce :)

Thanks for any help!

To comment on your first question, "db.test1.find({tags1:{$in:doc.tags}})" is going to return a cursor. To print to the screen you should append something like the following to the second find: .forEach(function(doc){print(tojson(doc));}) — James Wahlin, Apr 12 '13 at 12:53
Thanks @JamesWahlin ! I knew it was something silly :P Your quick comment allowed me to at least proceed further in finding a solution. — Aafreen Sheikh, Apr 21 '13 at 07:39

score 1 · Answer 1 · answered Jun 24 '19 at 19:25

In newer versions you can use aggregation to accomplish this.

db.test.aggregate(
    {
        $match: {
            tags1: {
                $in: doc.tags
            }
        }
    },
    {
        $project: {
            tags1: 1,
            intersection: {
                $setIntersection: [doc.tags, "$tags1"]
            }
        }
    }
);

As you can see, the match portion is exactly the same as your initial find() query. The project portion generates the result fields. In this case, it selects tags1 from the matching documents and also creates intersection from the input and the matching docs.

philnate · Answer 2 · 2013-09-24T20:14:02.867

If you want to have this in realtime you should consider to move away from Serverside Javascript which is only run with one thread and should be quite slow (single threaded) (this is no longer true for v2.4, http://docs.mongodb.org/manual/core/server-side-javascript/).

The positional operator only returns the first matching/current value. Without knowing the internal implementation, from the point of performance it doesn't even makes sense to look for further matching criteria if the document was already evaluated as match. So I doubt that you can go for this.

I don't know if you need the cartesian product for your search, but I would consider joining your few test one document tags into one and then have some $in search for it on test1, returning all matching documents. On your local machine you could have multiple threads which generate the intersection for your document.

Depending on how frequent your test1 and test collection changes, you're performing this query you might precalculate this information. Which would allow to easily do a query on the field which contains the intersection information.

The document is invalid because you have two fields names tags1

why is server-side JS relevant? he's not doing any server-side JS in his attempts - the shell runs js locally on the client machine. — Asya Kamsky, Sep 24 '13 at 02:42
Yes you're right the javascript executed is locally. I got on the wrong track here. — philnate, Sep 24 '13 at 20:12

score -1 · Answer 3 · answered Apr 15 '13 at 14:47

Mongo doesn't have any inherent ability to retrieve array intersections. If you really need to use ad-hoc querying get the intersection on the client side.

On the other hand, consider using Map-Reduce and storing it's output as a collection. You can augment the returned objects in the finalize section to add the intersecting tags. Cron MR to run every few seconds. You get the benefit of a permanent collection you can query from on the client side.

Array intersection in MongoDB

3 Answers3