2

For a pseudo random generator (game-related) I need to create a seed from a string. It's not for security, but I still prefer this seed to be as random as possible.

I don't use GetHashCode():

  • the implementation might differ per .NET version.
  • I'm also not sure about the randomness of GetHashCode().

Right now I use MD5 - see code below:

public static int GetSeedFromString(string value)
{
    using ( var md5 = MD5.Create() )
    {
        var inputBytes      = Encoding.ASCII.GetBytes(value);
        byte[] hashBytes    = md5.ComputeHash(inputBytes);

        // FIXME: force endianness for potential different systems
        return BitConverter.ToInt32(hashBytes, 0);
    }
}

Two problems:

  • it is dependent on Endian (for now) - hower this might be easily fixable
  • I'm not sure about the spread of this randomness. It doesn't have to be perfect, but if there's a simpler method with better results I prefer that.

Is there any more common way to generate a pseudo random seed from a string?

Dirk Boer
  • 8,522
  • 13
  • 63
  • 111
  • ***Is*** there an endianness problem though? `ASCII` is one byte per character. – Sweeper Apr 24 '20 at 11:28
  • 1
    May be [interesting](https://softwareengineering.stackexchange.com/q/49550/156546). – Sinatr Apr 24 '20 at 11:29
  • First of all `BitConverter.IsLittleEndian` can tell you if you need to reverse the array or not, so you're right that should be easy fixable. Secondly what exactly do you mean with "spread"? Hashes are specifically designed to reduce collisions so if you want to avoid those a hash doesn't seem like a bad choice. But it all depends on how strict your requirements are. Have you noticed anything that makes you think a hash is unsuited? –  Apr 24 '20 at 11:35
  • 1
    How about a simple and straight forward: `if (value is null) return 0; var hash = 17; unchecked { foreach (var c in value) { hash = (hash * 31) ^ c; } } return hash;`? -- 17 and 31 are prime numbers and should cause a somewhat even distribution. Not sure if other primes would yield much better distribution. – Corak Apr 24 '20 at 11:35
  • MD5 is a cryptographic hash, so it is slow. It is also broken, so it does not provide a lot of security. If you don't need crypto-level security then I would suggest a faster, non-cryptographic hash like the [FNV hash](http://isthe.com/chongo/tech/comp/fnv/). That produces unsigned numeric output in a range of sizes. – rossum Apr 24 '20 at 13:47
  • I suspect you're putting a lot of effort into something that doesn't matter. A seed is just the starting point for the sequence of values produced by your RNG. If you're using a crypto-quality RNG, the RNG itself will give lack of predictability to the generated sequence, regardless of starting point. If it's not crypto-quality, then a knowledgeable hacker will be able to determine the sequence no matter how much work you went through to seed it. You said it's not crypto quality, so in reality the seed is the entry point into the RNG's cycle. It doesn't affect quality of the sequence. – pjs Apr 24 '20 at 15:24

0 Answers0