4

In an application I use the concept of buckets to store objects. All buckets are empty at creation time. Some of which may fill up to their maximum capacity of 20 objects in 2hrs, some in 6 months. Each object's size is pretty much fixed, i.e. I don't expect their size to differ more than 10%, i.e. the sizes of full buckets wouldn't either. The implementation looks similar to that.

@Document
public class MyBucket {
  // maximum capacity of 20
  private List<MyObject> objects;
}

One approach to keep the padding factor low would be to prepopulate my bucket with dummy data. Two options come to my mind:

  1. Create the bucket with dummy data, save it, then reset its content and save it again
  2. Create the bucket with dummy data and flag it as "pristine". On the first write the flag is set to false and the data get reset.

The disadvantages are obvious, option 1 requires two DB writes, option 2 requires extra (non-business) code in my entities.

Probably I won't get off cheaply with any solution. Nevertheless, any real-life experience with that issue, any best practices or hints?

Setup: Spring Data MongoDB 1.9.2, MongoDB 3.2

Jan B.
  • 6,030
  • 5
  • 32
  • 53
  • Could you explain in more details what the issue really is, what problem are you solving with padding factor? – Andriy Simonov Aug 04 '16 at 09:09
  • The scenario that I want to avoid is the following: I create 100.000 preliminarily empty buckets in a couple of days. I know that 80% of those buckets will grow to the twentyfold of their size at creation time during one year. If I don't prepopulate those buckets, they will quickly have a padding factor of 4, which result in very inefficient memory usage, massive relocation and a waste of space. I know there are options like compact or repair, but I'd try to avoid that by telling MongoDB which document sizes it can expect. – Jan B. Aug 04 '16 at 10:08

1 Answers1

2

As far as understand your main concern is performance overhead related to increasing of documents size resulting to documents relocation and indexes update. It is actual for the mmapv1 storage engine, however since MongoDB version 3.0 there is the WiredTiger storage engine available that does not have such issues (check the similar question).

Community
  • 1
  • 1
Andriy Simonov
  • 1,276
  • 1
  • 11
  • 19