Is shortening MongoDB property names worthwhile?

Question

In mongodb docs the author mentions it's a good idea to shorten property names:

Use shorter field names.

and in an old blog post from how to node (it is offline by now April, 2022 edit)

....oft-reported issue with mongoDB is the size of the data on the disk... each and every record stores all the field-names .... This means that it can often be more space-efficient to have properties such as 't', or 'b' rather than 'title' or 'body', however for fear of confusion I would avoid this unless truly required!

I am aware of solutions of how to do it. I am more interested in when is this truly required?

Why not just have a source code version and a production version with shortened property names generated automatically? Create the production version from the source when pushing out an update? — TheZ, Oct 08 '12 at 23:28
From what I read, it looks like the author mentions it's *not* a good idea to shorten property names. I imagine "truly required" means "I only have 20 bytes of storage, so I have to shorten the property name to fit" — NullUserException, Oct 08 '12 at 23:28
@NullUserException The concept of minification (and the word itself) is often/usually applied to JS, but the idea is universal. I used the word for lack of a better one, edited for non-minification word usage. — TheZ, Oct 08 '12 at 23:30
@TheZ Unless there are tools that automate minification for Mongo, I don't see how this could be done safely. — NullUserException, Oct 08 '12 at 23:31
@NullUserException—yes. Also, in modern operating systems, disc compression can be applied independently of the application by the OS so the application doesn't need to deal with it. — RobG, Oct 08 '12 at 23:37
[At 5¢/GB, probably not.](http://www.amazon.com/s/ref=nb_sb_noss_2?field-keywords=2%20tb) — josh3736, Oct 08 '12 at 23:47
Vote for [SERVER-863](https://jira.mongodb.org/browse/SERVER-863) Single biggest improvement MongoDB can make IMO, as it will have a positive impact on all users. No more fussing over long field names, and significant savings in storage costs (and potentially bandwidth too if implemented in the driver). All taken care off behind the scenes. — FlappySocks, Mar 31 '13 at 11:44

score 26 · Answer 1 · answered Oct 08 '12 at 23:45

26

To quote Donald Knuth:

Premature optimization is the root of all evil (or at least most of it) in programming.

Build your application however seems most sensible, maintainable and logical. Then, if you have performance or storage issues, deal with those that have the greatest impact until either performance is satisfactory or the law of diminishing returns means there's no point in optimising further.

If you are uncertain of the impact of particular design decisions (like long property names), create a prototype to test various hypotheses (like "will shorter property names save much space"). Don't expect the outcome of testing to be conclusive, however it may teach you things you didn't expect to learn.

answered Oct 08 '12 at 23:45

RobG

142,382
31
172
209

31

True *but* sometimes you can anticipate a problem. If you know that a) your database will see heavy load, b) the collection will grow to contain a large number of records and c) the size of the field names is large, relative to the size of the data in the collection then you can reasonably predict a problem. Consider that if you find you have this problem only *after* creating a large number of records, MongoDB will make it painful to correct, possibly even requiring downtime for any app using the db. – itsbruce Oct 09 '12 at 00:51
2

Of course the design process must consider non–functional requirements such as the host environment, available space, performance, etc. The design should be reviewed to ensure it is likely to meet those requirements, testing will indicate whether the application does or not and whether (and what) remeidal action may be required long before it goes into production. That isn't premature optimisation, it's testing against requirements. – RobG Oct 09 '12 at 01:09
2

I simply think this is not a good answer. The balance of small optimizations with great returns always wins. I feel it now, after having terras of data in a DB just because I have large keys, and the cost of optimizing now is huge. – Alexandru R Oct 26 '20 at 07:25
1

I couldn't agree more on comments by @Alexandru. You can teach people what each short name mean but you can't simply rewrite code and change database without huge costs once it goes out of hand in terms of performance. For users of any software, it is the performance that matters. It does not matter how software engineers achieve it. – mvsagar Feb 05 '21 at 05:56

JohnnyHK · Answer 2 · 2015-07-30T04:06:28.960

Keep the priority for meaningful names above the priority for short names unless your own situation and testing provides a specific reason to alter those priorities.

As mentioned in the comments of SERVER-863, if you're using MongoDB 3.0+ with the WiredTiger storage option with snappy compression enabled, long field names become even less of an issue as the compression effectively takes care of the shortening for you.

score 10 · Answer 3 · answered Oct 09 '12 at 09:36

Bottom line up: So keep it as compact as it still stays meaningful.

I don't think that this is every truly required to be shortened to one letter names. Anyway you should shorten them as much as possible, and you feel comfortable with it. Lets say you have a users name: {FirstName, MiddleName, LastName} you may be good to go with even name:{first, middle, last}. If you feel comfortable you may be fine with name:{f, m,l}.
You should use short names: As it will consume disk space, memory and thus may somewhat slowdown your application(less objects to hold in memory, slower lookup times due to bigger size and longer query time as seeking over data takes longer).
A good schema documentation may tell the developer that t stands for town and not for title. Depending on your stack you may even be able to hide the developer from working with these short cuts through some helper utils to map it.

Finally I would say that there's no guideline to when and how much you should shorten your schema names. It highly depends on your environment and requirements. But you're good to keep it compact if you can supply a good documentation explaining everything and/or offering utils to ease the life of developers and admins. Anyway admins are likely to interact directly with mongodb, so I guess a good documentation shouldn't be missed.

score 4 · Accepted Answer · answered Jan 20 '21 at 15:01

I performed a little benchmark, I uploaded 252 rows of data from an Excel into two collections testShortNames and testLongNames as follows:

Long Names:

{
    "_id": ObjectId("6007a81ea42c4818e5408e9c"),
    "countryNameMaster": "Andorra",
    "countryCapitalNameMaster": "Andorra la Vella",
    "areaInSquareKilometers": 468,
    "countryPopulationNumber": NumberInt("77006"),
    "continentAbbreviationCode": "EU",
    "currencyNameMaster": "Euro"
}

Short Names:

{
    "_id": ObjectId("6007a81fa42c4818e5408e9d"),
    "name": "Andorra",
    "capital": "Andorra la Vella",
    "area": 468,
    "pop": NumberInt("77006"),
    "continent": "EU",
    "currency": "Euro"
}

I then got the stats for each, saved in disk files, then did a "diff" on the two files:

pprint.pprint(db.command("collstats", dbCollectionNameLongNames))

The image below shows two variables of interest: size and storageSize. My reading showed that storageSize is the amount of disk space used after compression, and basically size is the uncompressed size. So we see the storageSize is identical. Apparently the Wired Tiger engine compresses fieldnames quite well.

I then ran a program to retrieve all data from each collection, and checked the response time.

Even though it was a sub-second query, the long names consistently took about 7 times longer. It of course will take longer to send the longer names across from the database server to the client program.

-------LongNames-------
Server Start DateTime=2021-01-20 08:44:38
Server End   DateTime=2021-01-20 08:44:39
StartTimeMs= 606964546  EndTimeM= 606965328
ElapsedTime MilliSeconds= 782
-------ShortNames-------
Server Start DateTime=2021-01-20 08:44:39
Server End   DateTime=2021-01-20 08:44:39
StartTimeMs= 606965328  EndTimeM= 606965421
ElapsedTime MilliSeconds= 93

In Python, I just did the following (I had to actually loop through the items to force the reads, otherwise the query returns only the cursor):

results = dbCollectionLongNames.find(query)
for result in results:
    pass

score 3 · Answer 5 · answered Jan 09 '18 at 06:40

Adding my 2 cents on this..

Long named attributes (or, "AbnormallyLongNameAttributes") can be avoided while designing the data model. In my previous organisation we tested keeping short named attributes strategy, such as, organisation defined 4-5 letter encoded strings, eg:

First Name = FSTNM,
Last Name = LSTNM,
Monthly Profit Loss Percentage = MTPCT,
Year on Year Sales Projection = YOYSP, and so on..)

While we observed an improvement in query performance, largely due to the reduction in size of data being transferred over the network, or (since we used JAVA with MongoDB) the reduction in length of "keys" in MongoDB document/Java Map heap space, the overall improvement in performance was less than 15%.

In my personal opinion, this was a micro-optimzation that came at an additional cost (and a huge headache) of maintaining/designing an additional system of managing Data Attribute Dictionary for each of the data models. This system was required to have an organisation wide transparency while debugging the application/answering to client queries.

If you find yourself in a position where upto 20% increase in the performance with this strategy is lucrative to you, may be it is time to scale up your MongoDB servers/choose some other data modelling/querying strategy, or else to choose a different database altogether.

man, 20% is HUGE! Obviously, not speaking about a 100k records table. — Alexandru R, Oct 26 '20 at 07:26

score 0 · Answer 6 · answered May 20 '14 at 20:30

If using verbose xml, trying to ameliorate that with custom names could be very important. A user comment in the SERVER-863 ticket said in his case; I'm ' storing externally-defined XML objects, with verbose naming: the fieldnames are, perhaps, 70% of the total record size. So fieldname tokenization could be a giant win, both in terms of I/O and memory efficiency.'

score 0 · Answer 7 · answered May 31 '22 at 08:25

Collection with smaller name - InsertCompress Collection with bigger name - InsertNormal

I Performed this on our mongo sharded cluster and Analysis shows

There is around 10-15% gain in shorter names while saving and seems purely based on network latency. I added bulk insert using multiple threads. So if single inserts it can save more.
My avg data size for InsertCompress is 280B and InsertNormal is 350B and inserted 25 million records. So InsertNormal shows 8.1 GB and InsertCompress shows 6.6 GB. This is data size.
Surprisingly Index data size shows as 2.2 GB for InsertCompress collection and 2 GB for InsertNormal collection
Again the storage size is 2.2 GB for InsertCompress collection while InsertNormal its around 1.6 GB

Overall apart from network latency there is nothing gained for storage, so not worth to put efforts going in this direction to save storage. Only if you have much bigger document and smaller field names saves lot of data you can consider

Is shortening MongoDB property names worthwhile?

7 Answers7

Linked

Related