2

Ok, so every player in my game has a document in my players collection and each player has 1 string that is a serialized has of their game state. So this string can be way long or way short and vary a lot for every single player.

I had somebody who doesn't have a ton of mongo experience tell me that i should pad every single string in the collection so that they are all the same length. So like add tons of zeros at the end to all the short and medium game state strings.

So A) is this a good idea?

B) I'm not even totally sure how to find out the longest length of a game so Im not sure how far to pad them and what if later on game states exceed my padding length?

My friend said he had a mongo collection keep blowing up because of fragmentation and when he implemented padding all of his issues went away.

oh i doubt it matters but my code is in php and obviously uses the php pecl mongo driver

Thanks for any thoughts or input!!!!!

-dave

Dave Geurts
  • 319
  • 3
  • 7

3 Answers3

2

MongoDB allocates space for documents at creation time. If the size of the document increases the document will need to be moved to a new location to accomodate the larger size. The original space is not released to the operating system. Instead, MongoDB will eventually reuse this space. Until this happens, it may appear the database is over-allocated or what is sometimes called fragmented.

So, what probably happened to your friend:

  • documents were inserted
  • when fields were updated, their sizes sometimes increased, and the documents therefore grew
  • documents were moved as they grew, and the database became over-allocated (what your friend called fragmented)

And by padding the fields in the documents your friend was able to ensure documents never grew in size and therefore his database never became over-allocated.

The padding approach is valid but it also adds complexity to the application. Typically padding is performed for fields that will eventually be created, rather than fixing the size of the values themselves, but the idea is the same. In your case it doesn't sound like padding is a great option because you cannot predict the field size.

Instead, you might consider using usePowerOf2Sizes: http://docs.mongodb.org/manual/reference/command/collMod/

This configuration will automatically pad the space allocated for documents and will increase the chances that space is reused for efficiently by MongoDB at the cost of a slightly larger database.

kstirman
  • 221
  • 2
  • 3
1

So A) is this a good idea?

Depends. If the game documents were to be frequently updated in such a manner that they would move on disk a lot then you might find that padding does help, however, considering that the entire works of Shakespear can fit into a 4mb document with some room left I doubt very much that any string you have will cause a heavy amount of fragmentation; in fact I will be quite surprised if it does.

The problem that could, in theory, occur is that you get a lot of documents within your freelists and deleted buckets that cannot be reused causing fragmentation to occur.

Not only that but the IO of disk movement can be a killer if it becomes persistent.

B) I'm not even totally sure how to find out the longest length of a game so Im not sure how far to pad them and what if later on game states exceed my padding length?

Then the idea is useless, infact the idea is 90% of the time useless anyway and you would be better off using a power of 2 sizes allocation on your documents if this were to be a problem: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes

Using this option would be a far more optimal approach to solving fragmentation issues.

My friend said he had a mongo collection keep blowing up because of fragmentation and when he implemented padding all of his issues went away.

A friend of a friend, of a cousin, of a niece of mine said something similar too...you would be better off testing this for yourself.

I would bet that the bigger problem he had was with indexes and the queries he performed. It is extremely rare for string lengths to cause such a heaving amount of IO usage in disk movement that you would actually use artificial padding.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
0

From your question I understand those strings are just blobs, i.e. they are not structured in some way for allowing db queries/filtering on their contents. If this is the case, store them in files, and store file names in the mongo document.

shx2
  • 61,779
  • 13
  • 130
  • 153