Working with MongoDB, I'm fairly aware of when normalization makes sense, but I'm trying to understand if it makes sense from a data-size or query-performance sense.
As an example, let's say I have a list of 2,000 retail stores. For the example, I do not care about anything else so I only have the store name. At least a few characters but never longer than a hundred.
I have 20 million records of people who took a trip to a store. I'd record some info about the trip, along with this store name.
In MySQL, the file size of the table/records can impact the query performance quite heavily. Normalizing the store names so that I can store the string once, and then write the key (an int) instead saves on a lot of disk space. That turns into performance gains.
In MongoDB, this type of data is small and unimportant enough that it should simply be embedded. However, I'm concerned that after 20 million records, queries and/or file size will suffer performance problems because of how much redundant data is there.
If 5 million people all went to a single store, there's going to be 5 million embedded copies of that string in mongo documents. That will take up a lot more disk space and seems like it may eat up performance over an integer.
I suppose if I did normalize the data, Mongo would still have some performance overhead to query that additional data.