6

I'm trying to create short non-colliding strings from longer strings in Ruby. What's the best way to do this? Base64 encode a MD5 hash?

This is the use case:

loop do
  key = short_hash("#{user_id}-#{timestamp}")
  break if $redis.setnx(key, "0")
end

I don't want key to be too long.

Aaron Gibralter
  • 4,773
  • 3
  • 35
  • 50
  • There's a bunch of questions on this site about similar topics. Try searching for hashing topics. Here's one: http://stackoverflow.com/questions/4066601/developing-a-url-shortener/4066615#4066615 – Jeff Swensen Feb 17 '11 at 00:53
  • 1
    @Sugerman: That question is in Python. – Andrew Grimm Feb 17 '11 at 01:12
  • 2
    What you might gather from the response in that (and other) threads if you read them is that the "best way" to do this is language agnostic. Choose your hashing algorithm first, and then worry about the language-specific implementation. – Jeff Swensen Feb 17 '11 at 01:25
  • Understood -- I looked through the site and couldn't find a recommendation for a hashing algorithm that produces "short" (less than, say, 12 characters) strings. I completely understand that collisions are possible with a short hash, but my loop would ensure that collisions get tried again. – Aaron Gibralter Feb 18 '11 at 15:52

2 Answers2

5

I often use a SHA has for this similar to the example you have. It's not guaranteed to be unique, but is usually good enough for most purposes:

require 'digest/sha1'
Digest::SHA1.hexdigest("#{user_id}-#{Time.now.to_i}-#{rand}")

The ruby UUID gem is another option.

But in your specific case since you're using redis, why not just use the redis INCR command? Then you can guarantee the uniqueness at least within your database. For example:

unique_key = $redis.incr('users:next')
Mat Schaffer
  • 1,634
  • 1
  • 15
  • 24
  • Hmm I was thinking about using `incr`, but I need to store a value for the unique_key... I guess I could do `uid = $r.incr('uids'); $r.set(uid, value)` – Aaron Gibralter Feb 18 '11 at 05:32
  • So I ended up going with `incr` -- but as for my original question, I was hoping to have a shorter hash than a `Digest::SHA1.hexdigest`. I guess I could use base64 encoding... – Aaron Gibralter Feb 18 '11 at 15:50
4

You can use a hash function to create shorter strings that aren't likely to collide. However, the Pigeonhole principle guarantees that you will be able to find two longer strings that will hash to the same value.

To generate truly unique values, you may have to assign a sequential identification number. But this would also require that you keep track of which identification number you have associated with which input string.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • Sorry, I forgot to mention that I will be checking for collisions and retrying. I just want to avoid "retries" as much as possible. – Aaron Gibralter Feb 17 '11 at 00:51