3

I know the cardinal rule of SE is to not ask a question without giving examples of what you've already tried, but in this case I can't find where to begin. I've looked at the documentation for MongoDB and it looks like there are only two ways to calculate storage usage:

  1. db.collection.stats() returns the statistics about the entire collection. In my case I need to know the amount of storage being used to by a subset of data within a collection (data for a particular user).
  2. Object.bsonsize(<document>) returns the storage size of a single record, which would require a cursor function to calculate the size of each document, one at a time. My only concern with this approach is performance with large amounts of data. If a single user has tens of thousands of documents this process could take too long.

Does anyone know of a way to calculate the aggregate document size of set of records within a collection efficiently and accurately.

Thanks for the help.

Brian Shamblen
  • 4,653
  • 1
  • 23
  • 37

1 Answers1

1

This may not be the most efficient or accurate way to do it, but I ended up using a Mongoose plugin to get the size of the JSON representation of the document before it's saved:

module.exports = exports = function defaultPlugin(schema, options){
    schema.add({
        userId: { type: mongoose.Schema.Types.ObjectId, ref: "User", required: true },
        recordSize: Number
    });

    schema.pre('save', function(next) {
        this.recordSize = JSON.stringify(this).length;
        next();
    });
}

This will convert the schema object to a JSON representation, get it's length, then store the size in the document itself. I understand that this will actually add a tiny bit of extra storage to record the size, but it's the best I could come up with.

Then, to generate a storage report, I'm using a simple aggregate call to get the sum of all of the recordSize values in the collection, filtered by userId:

mongoose.model('YouCollectionName').aggregate([
{
    $match: { 
        userId: userId
    }
},
{ 
    $group: {
        _id: null,
        recordSize: { $sum: '$recordSize'},
        recordCount: { $sum: 1 }
    }
}
], function (err, results) {
   //Do something with your results
});
Brian Shamblen
  • 4,653
  • 1
  • 23
  • 37
  • Before I read your answer, I was thinking "perhaps store the size during save/update in a separate stats table". So I'd agree with this solution (after reading other posts having similar challenges. As to adding storage size, It's encouraged to use size vs. processing power in the nosql corner of the world, and adding a few bytes to potentially large documents is negligible. – scipilot Jun 08 '17 at 11:22
  • What's the name of the plugin? I'd like to use your technique but I couldn't gather it from your example. – scipilot Jun 08 '17 at 11:23
  • Oh I see - you wrote a plugin! Got it. – scipilot Jun 08 '17 at 11:25