I am trying to understand the internal allocation and placement of arrays and hashes (which, from my understanding are implemented through arrays) in MongoDB documents.
In our domain we have documents with anywhere between thousands and hundreds of thousands of key-value pairs in logical groupings up to 5-6 levels deeps (think nested hashes).
We represent the nesting in the keys with a dot, e.g., x.y.z
, which upon insertion into MongoDB will automatically become something like:
{
"_id" : "whatever",
"x" : {
"y" : {
"z" : 5
}
}
}
The most common operation is incrementing a value, which we do with an atomic $inc
, usually 1000+ values at a time with a single update command. New keys are added over time but not frequently, say, 100 times/day.
It occurred to me that an alternative representation would be to not use dots in names but some other delimiter and create a flat document, e.g.,
{
"_id" : "whatever",
"x-y-z" : 5
}
Given the number of key-value pairs and the usage pattern in terms of $inc
updates and new key insertion, I am looking for guidance on the trade-offs between the two approaches in terms of:
space overhead on disk
performance of
$inc
updatesperformance of new key inserts