4

Let's take for example a blog post where a unique slug is generated from the post's title: sample_blog_post. Instead of storing a mongo ObjectId as the _id, say you store the slug in the _id. Besides the obvious case where the slug may change if the title changes, are there major disadvantages in terms of performance by using a string instead of a numerical _id? This could become problematic if, say, the number of posts became very large, say, over a million. But if the number of posts was relatively low, say, 2000, would it make much of a difference? So far the only thing about the ObjectId that I think I'd take advantage of is the created_on date the comes for free.

So in summation, is it worth it to store the slug as the _id and not use an ObjectId? There seems to be discussion on how to store alternate values as an _id, but not the performance advantages/disadvantages to it.

nini
  • 123
  • 10

1 Answers1

3

So in summation, is it worth it to store the slug as the _id and not use an ObjectId?

In my opinion, no. The performance difference will be negligible for most scenarios (except paging), but

  • The old discussion of surrogate primary keys comes up. A "slug" is not a very natural key. Yes, it must be unique, but as you already pointed out, changing the slug shouldn't be impossible. This alone would keep me from bothering...
  • Having a monotonic _id key can save you from a number of headaches, most importantly to avoid expensive paging via skip and take (use $lt/$gt on the _id instead).
  • There's a limit on the maximum index length in mongodb of less than 1024 bytes. While not pretty, URLs are allowed to be a lot longer. If someone entered a longer slug, it wouldn't be found because it's silently dropped from the index.
  • It's a good idea to have a consistent interface, i.e. to use the same type of _id on all, or at least, most of your objects. In my code, I have a single exception where I'm using a special hash as id because the value can't change, the collection has extremely high write rates and it's large.
  • Let's say you want to link to the article in your management interface (not the public site), which link would you use? Normally the id, but now the id and the slug are equivalent. Now a simple bug (such as allowing an empty slug) would be hard to recover from, because the user couldn't even go to the management interface anymore.
  • You'll be dealing with charset issues. I'd suggest to not even use the slug for looking up the article, but the slug's hash.

Essentially, you'd end up with a schema like

{ "_id" : ObjectId("a237b45..."), // PK
  "slug" : "mongodb-is-fun", // not indexed
  "hash" : "5af87c62da34" } // indexed, unique
Community
  • 1
  • 1
mnemosyn
  • 45,391
  • 6
  • 76
  • 82
  • What kind of hash function would you use on the slug? I am contemplating using a url like http://localhost/posts// – nini Oct 25 '13 at 18:11
  • 2
    i don't think that matters. A simple MD5 would do in this case, but feel free to use a SHA hash. The hash in the URL doesn't help much. Either use the slug only and hash it for a lookup, or use `/posts/id/slug` (like SO does) which combines the simplicity (and immutability) of the id with the SEO advantage of the slug. – mnemosyn Oct 25 '13 at 18:59