background & needs
I have documents in MongoDB with a natural primary key of type text (an url or a sentence, let's call it text
). I need to ensure uniqueness. I also often use partial searches (text
contains substring
), but this part is less critical.
With MongoDB < 4.2, the index key limit is 1024 bytes, forcing me to use a hash as _id
. However, version 4.2 removes the Index Key Limit (doc).
question using version 4.2, what would be better:
- use
text
as the_id
or - use a hashed id (
_id: hex_md5(text)
) and add an index on thetext
field (potentially a text index to speed up the partial searches);
To reformulate, I wonder what are the downsides of having a long text as an _id
. Is the index created different from another "regular" index ? Do I gain anything by using md5 ids ?