69

I have a set like so

{date: 20120101}
{date: 20120103}
{date: 20120104}
{date: 20120005}
{date: 20120105}

How do I save a subset of those documents with the date '20120105' to another collection?

i.e db.subset.save(db.full_set.find({date: "20120105"}));

SemperFly
  • 1,563
  • 3
  • 17
  • 31

7 Answers7

104

I would advise using the aggregation framework:

db.full_set.aggregate([ { $match: { date: "20120105" } }, { $out: "subset" } ])

It works about 100 times faster than forEach at least in my case. This is because the entire aggregation pipeline runs in the mongod process, whereas a solution based on find() and insert() has to send all of the documents from the server to the client and then back. This has a performance penalty, even if the server and client are on the same machine.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
melan
  • 1,688
  • 1
  • 11
  • 8
  • just take in account that it will fail if aggregation result exceeds maximum document size (16MB) – c24b May 19 '15 at 15:19
  • 6
    $out doesn't have the usual limitation on document size, since you're writing to a collection – Kyrstellaine May 29 '15 at 22:31
  • Further evidence: I just ran a comparison with my own data and aggregate is about 50x faster than forEach. I also compared to mapreduce, and aggregate is about 5x faster. – Luke Feb 09 '16 at 15:35
  • 21
    Warning. This is a very dangerous command. It will remove anything that is on the "subset" collection. – zhaokongsheng Dec 21 '16 at 03:44
  • 6
    To clarify the comment by @zhaokongsheng, if the `$out` collection already exists "... the `$out` stage atomically replaces the existing collection with the new results collection." `$out` does not append to the existing collection. See https://docs.mongodb.com/manual/reference/operator/aggregation/out/ – Paul Feb 22 '18 at 19:38
  • do see if you guys find something to implement from https://jira.mongodb.org/browse/SERVER-12280 Also, $merge is another https://docs.mongodb.com/master/reference/operator/aggregation/merge/ and even shows the comparison with $out but when I try use it after $match, i get the error MongoError: Unrecognized pipeline stage name: '$merge' – user3003976 Aug 08 '19 at 05:18
75

Here's the shell version:

db.full_set.find({date:"20120105"}).forEach(function(doc){
   db.subset.insert(doc);
});

Note: As of MongoDB 2.6, the aggregation framework makes it possible to do this faster; see melan's answer for details.

Jonny
  • 3,807
  • 8
  • 31
  • 48
Eve Freeman
  • 32,467
  • 4
  • 86
  • 101
  • 6
    As of [2.2](http://docs.mongodb.org/manual/reference/method/db.collection.insert/), insert can take an array of docs, so you could do `var docs = ...find(...).toArray(); db.coll.insert(docs)`. I haven't found performance to be very good in either case, though – Michael Haren Mar 01 '13 at 02:58
  • 1
    Unfortunately [$out](https://docs.mongodb.org/manual/reference/operator/aggregation/out/) came to Mongo since version [2.6](https://docs.mongodb.org/manual/release-notes/2.6/), not [2.2](https://docs.mongodb.org/manual/release-notes/2.2/) – CroWell Nov 11 '15 at 11:01
20

Actually, there is an equivalent of SQL's insert into ... select from in MongoDB. First, you convert multiple documents into an array of documents; then you insert the array into the target collection

db.subset.insert(db.full_set.find({date:"20120105"}).toArray())
user3116889
  • 201
  • 2
  • 2
20

The most general solution is this:

Make use of the aggregation (answer given by @melan):

db.full_set.aggregate({$match:{your query here...}},{$out:"sample"})
db.sample.copyTo("subset")

This works even when there are documents in "subset" before the operation and you want to preserve those "old" documents and just insert a new subset into it.

Care must be taken, because the copyTo() command replaces the documents with the same _id.

chridam
  • 100,957
  • 23
  • 236
  • 235
antonimmo
  • 393
  • 4
  • 8
3

Mongodb has aggregate along with $out operator which allow to save subset into new collection. Following are the details :

$out Takes the documents returned by the aggregation pipeline and writes them to a specified collection.

  • The $out operation creates a new collection in the current database if one does not already exist.
  • The collection is not visible until the aggregation completes.
  • If the aggregation fails, MongoDB does not create the collection.

Syntax :

{ $out: "<output-collection>" }

Example A collection books contains the following documents:

{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }

The following aggregation operation pivots the data in the books collection to have titles grouped by authors and then writes the results to the authors collection.

db.books.aggregate( [
  { $group : { _id : "$author", books: { $push: "$title" } } },
    { $out : "authors" }
] )

After the operation, the authors collection contains the following documents:

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

In the asked question, use following query and you will get new collection named 'col_20120105' in your database

 db.products.aggregate([
  { $match : { date : "20120105" } },
  { $out : "col_20120105" }
]);
Amitesh Bharti
  • 14,264
  • 6
  • 62
  • 62
3

There's no direct equivalent of SQL's insert into ... select from ....

You have to take care of it yourself. Fetch documents of interest and save them to another collection.

You can do it in the shell, but I'd use a small external script in Ruby. Something like this:

require 'mongo'

db = Mongo::Connection.new.db('mydb')

source = db.collection('source_collection')
target = db.collection('target_collection')

source.find(date: "20120105").each do |doc|
  target.insert doc
end
Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • This bring each document to the client, while [aggregation runs entirely on the server](https://stackoverflow.com/questions/9711529/save-subset-of-mongodb-collection-to-another-collection/25247084#25247084). – Dan Dascalescu Nov 04 '18 at 11:43
  • @DanDascalescu: note the timestamp of the answer. At that time there was no aggregation framework. – Sergio Tulentsev Nov 04 '18 at 17:20
  • Good to see you're still active on SO! Would be nice to update the answer now that it's been a while and there are better solutions. – Dan Dascalescu Nov 05 '18 at 08:10
0

You can also use $merge aggregation pipeline stage.

db.full_set.aggregate([
  {$match: {...}},
  { $merge: {
     into: { db: 'your_db', coll: 'your_another_collection' },
     on: '_id',
     whenMatched: 'keepExisting',
     whenNotMatched: 'insert'
  }}
])
gzc
  • 8,180
  • 8
  • 42
  • 62