3

I found this answer - Answer link

db.full_set.aggregate([ { $match: { date: "20120105" } }, { $out: "subset" } ]);

I want do same thing but with first 15000 documents in collection, I couldn't find how to apply limit to such query (I tried using $limit : 15000, but it doesn't recognize $limit)

also when I tried -

db.subset.insert(db.full_set.find({}).limit(15000).toArray())

there is no function toArray() for output type cursor.

Guide me how can I accomplish it?

Community
  • 1
  • 1
Darpan
  • 5,623
  • 3
  • 48
  • 80
  • 1
    Did you try `db.full_set.aggregate([ { $match: { date: "20120105" } }, { $limit : 15000 }, { $out: "subset" } ]);`? – chridam Oct 10 '16 at 13:32
  • I am running this in Jupyter notebook, it says "$" is not a valid syntax. – Darpan Oct 10 '16 at 14:17

1 Answers1

1

Well,
in python, this is how things work - $limit needs to be wrapped in "",
and you need to create a pipeline to execute it as a command.

In my code -

    pipeline = [{ '$limit': 15000 },{'$out': "destination_collection"}]
    db.command('aggregate', "source_collection", pipeline=pipeline)

You need to wrap everything in double quotes, including your source and destination collection. And in db.command db is the object of your database (ie dbclient.database_name)

As per this answer -

It works about 100 times faster than forEach at least in my case. This is because the entire aggregation pipeline runs in the mongod process, whereas a solution based on find() and insert() has to send all of the documents from the server to the client and then back. This has a performance penalty, even if the server and client are on the same machine.

The one that really helped me figure this answer out - Reference 1
And official documentation

Darpan
  • 5,623
  • 3
  • 48
  • 80