0

I have a MongoDB with approximately 50 collections but it can increase in future. On each collections we will have fields ranging from 5 - 11 columns.

My question is how do I optimize the MongoDB so that I do not take up storage spaces because of superLongCollectionFieldName. How is character/word calculated when storing the data?

Lets say I have a field called, userID and another field called, IP does it both take full size for the bit block?

floss
  • 2,603
  • 2
  • 20
  • 37
  • 1
    The field names are not relevant to data storage size at all, unless you have millions of different field names. – Bergi Jan 28 '19 at 19:18
  • @Bergi I won't have several `fieldnames` but I may probably have several millions `fieldvalues`. With that JSON data will increase the file size because each object array is repeated. – floss Jan 28 '19 at 19:22
  • 1
    Mongodb (like any other document-oriented database) stores the field names only once, no matter how often you use them with no matter how many values. – Bergi Jan 28 '19 at 19:25
  • So that means there is only one instance of `fieldvalue` and the only time MongoDB size increases is if there are data inserts... Which means, with new data inserts, it does not add new constant key `fieldvalue`... am I getting it correct? – floss Jan 28 '19 at 21:59
  • 1
    Yes, exactly, the `superLongCollectionFieldName` is used only for serialisation, not in storage. – Bergi Jan 28 '19 at 22:03
  • @Bergi FYI: MongoDB (as at 4.0) does not maintain a central catalog of field names: field names are stored in each document so documents are self-describing in a distributed deployment. The BSON document MongoDB sends over the wire is the same as the document that is stored, although technical details of the on-disk format may vary by storage engine implementation. For example, the WiredTiger storage engine supports data compression as well as index prefix compression. The older (and now deprecated) MMAP storage engine does not have any support for compression. – Stennie Jan 29 '19 at 05:54
  • @Stennie Oops, I have overestimated MongoDB. Should I better delete my comments? – Bergi Jan 29 '19 at 12:12

1 Answers1

2

The overall storage required for your data will depend on many use case specific factors including schema, indexes, how compressible the data is, and your data update/deletion patterns. The length of field names does not significantly affect index size (since indexes only store key values and document locations), but long names may have some impact on storage usage. The best way to guesstimate storage usage would be to generate some representative test data using a data generator or by extrapolating from existing data.

MongoDB (as at 4.0) does not maintain a central catalog of field names: field names are stored in each document so documents are self-describing in a distributed deployment. In all modern versions of MongoDB (3.2+) data is compressed by default so the size of field names is not a typical concern for most use cases.

You could implement a mapping to shorter names via application code, but that will add translation overhead and reduce clarity of the documents stored in the server. For more discussion, see: SERVER-863: Tokenize the field names.

Stennie
  • 63,885
  • 14
  • 149
  • 175