If security is no issue, what you are describing is in my opinion not a hash function. A hash function is a one-way function, meaning computing the hash is easy but reverting it is "hard" or, ideally, impossible.
Your requirements instead describe an injective function Given any x1, x2 in your domain X the following holds:
For all x1, x2 element of X, x1 != x2 => f(x1) != f(x2)
f(x) = x is such a function, f(x) = x² is not. In plain English: you want to have different results if your inputs are different, same results only if the inputs are the same. It is true that this also is true for secure hashes, but they additionally provide the one-way characteristics such as the property of not being able (easily) to find x if you are only given f(x), among others. As far as I understood, you don't need these security properties.
Trivially, such an injective mapping from String to Float would be given by simply interpreting the "String bytes" as "Float bytes" from now on, i.e. you interpret the bytes differently (think C:
unsigned char *bytes = "...";
double d = (double)bytes;
). But, there is as downside to this - the real trouble is that Float has a maximum precision, so you will run into an overflow situation if your strings are too long (Floats are internally represented as double
values, that's 8 bytes on a 32 bit machine). So not enough space for almost any use case. Even MD5-ing your strings first doesn't solve the problem - MD5 output is already 16 bytes long.
So this could be a real problem, depending on your exact requirements. Although MD5 (or any other hash) will mess sufficiently with the input to make it as random as possible, you still cut the range of possible values from 16 bytes to effectively 8 bytes. (Note: Truncating random 16 byte output at 8 bytes is generally considered "secure" in terms of preserving the randomness. Elliptic Curve Cryptography does something similar. But as far as I know, nobody can really prove it, but neither could someone prove the contrary so far). So a collision is much more likely with your restricted Float range. By the birthday paradox, finding a collision takes sqrt(number of values in a finite range) tries. For MD5 this is 2^64, but for your scheme it's only 2^32. That's still very, very unlikely to produce a collision. It's probably something in the order of winning the lottery while at the same time being hit by a lightning. If you could live with this minimal possibility, go for it:
def string_to_float(str)
Digest::MD5.new.digest(str).unpack('D')
end
If uniqueness is of absolute priority I would recommend to move from floats to integers. Ruby has built-in support for large integers that are not restricted by the internal constraints of a long
value (that's what a Fixnum boils down to). So any arbitrary hash output could be represented as a large integer number.