What is causing this implementation of GetHashCode to be 20 times slower than .net's implementation?

Question

I got the idea of a Substring struct from this post and this one. The second post has the implementation of .net's String.GetHashCode(). (I'm not sure which version of .net this is from.)

Here is the implementation. (GetHashCode is taken from the second source listed above.)

public struct Substring
{
    private string String;
    private int Offset;
    public int Length { get; private set; }
    public char this[int index] { get { return String[Offset + index]; } }

    public Substring(string str, int offset, int len) : this()
    {
        String = str;
        Offset = offset;
        Length = len;
    }

    /// <summary>
    /// See http://www.dotnetperls.com/gethashcode
    /// </summary>
    /// <returns></returns>
    public unsafe override int GetHashCode()
    {
        fixed (char* str = String + Offset)
        {
            char* chPtr = str;
            int num = 352654597;
            int num2 = num;
            int* numPtr = (int*)chPtr;
            for (int i = Length; i > 0; i -= 4)
            {
                num = (((num << 5) + num) + (num >> 27)) ^ numPtr[0];
                if (i <= 2)
                {
                    break;
                }
                num2 = (((num2 << 5) + num2) + (num2 >> 27)) ^ numPtr[1];
                numPtr += 2;
            }
            return (num + (num2 * 1566083941));
        }
    }
}

Here's a unit test:

    [Test]
    public void GetHashCode_IsAsFastAsString()
    {
        var s = "The quick brown fox";
        var sub = new Substring(s, 1, 5);
        var t = "quick";
        var sum = 0;

        sum += sub.GetHashCode(); // make sure GetHashCode is jitted 

        var count = 100000000;
        var sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += t.GetHashCode();
        var t1 = sw.Elapsed;
        sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += sub.GetHashCode();
        var t2 = sw.Elapsed;

        Debug.WriteLine(sum.ToString()); // make sure we use the return value
        var m1 = t1.Milliseconds;
        var m2 = t2.Milliseconds;
        Assert.IsTrue(m2 <= m1); // fat chance
    }

The problem is that m1 is 10 milliseconds and m2 is 190 milliseconds. (Note: this is with 1000000 iterations.) FYI, I ran this on .net 4.5 64 bit Release build with Optimizations turned on.

Not related to the problem, but did you write this class in an effort to save memory? — Matthew, Sep 10 '14 at 18:14
You are making traditional bench-marking mistakes. Like including the jitting overhead in the measurement. And not actually using the return value, allowing the jitter optimizer to eliminate the code completely. — Hans Passant, Sep 10 '14 at 18:17
That's a good point. So I went back and added another loop of sub.GetHashCode() before doing any timing. Same result - to the millisecond. — bright, Sep 10 '14 at 18:22
Are you compiling for debug or release? Trying the code in LINQPad, the substring version takes ~10 times as long with optimizations off, but is about the same speed with optimizations on. — Richard Deeming, Sep 10 '14 at 18:30
I've compiled for Release, and verified that Optimize Code is checked in project settings. — bright, Sep 10 '14 at 18:34
@bright: `o-: Substring: 0.1175266; String: 0.0133497`, `o+: Substring: 0.0225464; String: 0.0253571`; it doesn't seem to make any significant difference whether I test the `string` or `Substring` method first. — Richard Deeming, Sep 10 '14 at 18:56
Thanks - are you on 32 bit .net by any chance? I see now that the GetHashCode() above is optimized for 32 bits. 64 bit code would be a lot faster since it can handle 8 bytes at a time. — bright, Sep 10 '14 at 18:59
@bright: I've just tried in 64-bit LINQPad, and I'm seeing very similar results to my previous comment for the same code. The code I'm using is: http://pastebin.com/mSc8dYsB — Richard Deeming, Sep 10 '14 at 19:19
I tried the code you pasted, and the difference I'm seeing with optimizations is only 3x: 32ms vs 10ms. Also, the 3x slowdown is very likely a 32/64 bit difference, given that the code listed is for 32 bit. If you would like to put your comments into an answer I'm happy to accept it. Cheers. — bright, Sep 10 '14 at 19:30
You are still not using `sum`. Add `GC.KeepAlive(sum);`. The debugger suppresses optimizations on launch. Start without the debugger. Increase the test duration by 10x or more. — usr, Sep 10 '14 at 20:17
Actually, I am using ```sum``` in the ```Debug.WriteLine```. Your other points are good. — bright, Sep 11 '14 at 07:36
No, you're not, @usr is right. Debug.WriteXXX statements are removed when running in release mode, leading to you _not_ using `sum`. — Abel, Jun 02 '18 at 02:11

score -1 · Accepted Answer · answered Sep 11 '14 at 07:56

Clued by a comment, I double checked to make sure that optimized code was running. It turns out that an obscure Debugger setting was disabling optimizations. So I unchecked Tools – Options – Debugging – General – Suppress JIT optimization on module load (Managed only). This caused optimized code to load properly.
Even with optimizations turned on there is still a about a 3x - 6x difference. However, this might be attributable to the fact that the code above is the .net 32 bit version and I'm running 64 bit .net. Porting the 64 bit implementation of string.GetHashCode to Substring is not as easy because it relies on a zero end of string marker (which is in fact a bug).

At this time I'm disappointed about not getting parity performance, but this was an excellent use of my time in learning about some of the perils and pitfalls of optimizing C#.

You're surprised that you don't get parity performance, but you're comparing apples with pears. As a result of your indexing, the start of your loop doesn't begin at a dword boundary. That results in extremely slow assembly code. If you want parity, Maye sure you align you're loop start. Also make sure your strings are larger. 90 percent is now overhead in your code and you don't use `sum`. To compare performance you'll need a radically different setup and you'll reach a different conclusion: that you _can write code with the same performance as most .NET internals_ — Abel, Jun 02 '18 at 02:24

What is causing this implementation of GetHashCode to be 20 times slower than .net's implementation?

1 Answers1