2

Is there a good reason to serialize null elements in a Cosmos DB document or is it better to ignore them?

With the is_defined function I can query for undefined elements similar to how I query for null elements.

Does either consume less RUs? In my tests they seem to perform similarly.

Eli Pulsifer
  • 713
  • 9
  • 25

2 Answers2

2

If your query truly depends on filtering based on the existence of, or value of, an optional property, then do exactly that: either check for existence (or non-existence), or check that an optional property is a specific value you're looking for.

Storing null properties is an anti-pattern with document databases such as Cosmos DB. It's not required, and if you do decide to do it, you'll have to add new null properties to existing documents every time you add a new property (potentially costly, since you'd have to perform a ReplaceDocument() on every single existing document, every time you add a new property that can be null). Same thing when you decide to remove an optional property, and cleaning up all of your extraneous nulls.

Cosmos DB doesn't require every document to be the same, and you'd be giving up very big benefit by approaching data the same way as a relational store (where you do have to deal with nulls in table columns). Just imagine a shopping site, with thousands of product types, each with varying properties (books, CDs, lawn mowers, coffee...). You'd end up with thousands of null properties per document (which seems like a very unmanageable scenario, not to mention the per-document size limit you'll likely exceed eventually).

Also, you will incur additional RU per write, since every index will need to be updated for every document.

David Makogon
  • 69,407
  • 21
  • 141
  • 189
1

Not sending keys that don't have values will save you space some small amount of bytes (and thus RU/s) and there isn't any important performance difference in queries otherwise.

This could be significant if you have VERY sparse values among your keys. For instance, let's say you could have 1 of 1 million keys per doc and let's assume it is ~7 bytes per key. Well you'd be out of luck if you included all 1 million keys with a null value for all but one because in keys alone you'd have 7MB and your doc can only be 2MB.

It can add up for a single doc at scale. If one 7-byte key in each of 1 million documents reads is null (much more common) instead of undefined, it will theoretically cost 7000 RU/s to read them. That's about $340 a month spent on a key with a null value assuming you're doing 1M RPS the whole month (but that would only be .8% of your cost, so other optimizations like using the right indexes/etc. would make bigger differences).

Chris Anderson
  • 8,305
  • 2
  • 29
  • 37