2

[MongoDB shell or pyMongo] I would like to know how to efficiently convert one record in a collection with an array in one field, to multiple records in say anew collection. So far, the only solution, I've been able to achieve is iterating the records one by one and then iterating the array in the field I want and do individual inserts. I'm hoping there's a more efficient way to do this.

Example:

I want to take a collection in MongoDB with structure similar to :

[{
    "_id": 1,
    "points": ["a", "b", "c"]
}, {
    "_id": 2,
    "points": ["d"]
}]

and convert it to something like this:

[{
    "_id": 1,
    "points": "a"
}, {
    "_id": 2,
    "points": "b"
}, {
    "_id": 3,
    "points": "c"
}, {
    "_id": 4,
    "points": "d"
}]

2 Answers2

0

Assuming you're ok with auto-generated _id values in the new collection, you can do this with an aggregation pipeline that uses $unwind to unwind the points array and $out to output the results to a new collection:

db.test.aggregate([
    // Duplicate each doc, one per points array element
    {$unwind: '$points'},

    // Remove the _id field to prompt regeneration as there are now duplicates
    {$project: {_id: 0}},

    // Output the resulting docs to a new collection, named 'newtest'
    {$out: 'newtest'}
])
JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
  • Thanks a lot. This was exactly what I was looking for. An additional question, if I may - can I use unwind on two or more fields, given that the two fields always have the same number of elements in their arrays for any given record ? example: '[{ "points": ["a", "b", "c"], "scores": [1,2,3] }, { "points": ["d"], scores:[4] }]' – Deepan Anbarasan Oct 10 '18 at 12:48
  • I found the answer to my second question in the comments. The answer can be found here [link](https://stackoverflow.com/questions/39426022/mongodb-show-children-items-in-one-to-many-relationship) – Deepan Anbarasan Oct 10 '18 at 12:57
0

Here's another version which can be expected to perform worse than @JohnnyHK's solution because of a second $unwind and a potentially massive $group but it generates integer IDs based on some order that you can specify in the $sort stage:

db.collection.aggregate([{
    // flatten the "points" array to get individual documents
    $unwind: { "path": "$points" },
}, {
    // sort by some criterion
    $sort: { "points": 1 }
}, {
    // throw all sorted "points" in the very same massive array
    $group: {
        _id: null,
        "points": { $push: "$points" },
    }
}, {
    // flatten the massive array making each document's position index its `_id` field
    $unwind: {
        "path": "$points",
        includeArrayIndex: "_id"
    }
} , {
    // write results to new "result" collection
    $out: "result"
}], {
    // make sure we do not run into memory issues
    allowDiskUse: true
})
dnickless
  • 10,733
  • 1
  • 19
  • 34