14

I tested different ways of generating a timestamp when I found something surprising (to me).

Calling Windows's GetSystemTimeAsFileTime using P/Invoke is about 3x slower than calling DateTime.UtcNow that internally uses the CLR's wrapper for the same GetSystemTimeAsFileTime.

How can that be?

Here's DateTime.UtcNow's implementation:

public static DateTime UtcNow {
    get {
        long ticks = 0;
        ticks = GetSystemTimeAsFileTime();
        return new DateTime( ((UInt64)(ticks + FileTimeOffset)) | KindUtc);
    }
}

[MethodImplAttribute(MethodImplOptions.InternalCall)] // Implemented by the CLR
internal static extern long GetSystemTimeAsFileTime();

Core CLR's wrapper for GetSystemTimeAsFileTime:

FCIMPL0(INT64, SystemNative::__GetSystemTimeAsFileTime)
{
    FCALL_CONTRACT;

    INT64 timestamp;

    ::GetSystemTimeAsFileTime((FILETIME*)&timestamp);

#if BIGENDIAN
    timestamp = (INT64)(((UINT64)timestamp >> 32) | ((UINT64)timestamp << 32));
#endif

    return timestamp;
}
FCIMPLEND;

My test code utilizing BenchmarkDotNet:

public class Program
{
    static void Main() => BenchmarkRunner.Run<Program>();

    [Benchmark]
    public DateTime UtcNow() => DateTime.UtcNow;

    [Benchmark]
    public long GetSystemTimeAsFileTime()
    {
        long fileTime;
        GetSystemTimeAsFileTime(out fileTime);
        return fileTime;
    }

    [DllImport("kernel32.dll")]
    public static extern void GetSystemTimeAsFileTime(out long systemTimeAsFileTime);
}

And the results:

                  Method |     Median |    StdDev |
------------------------ |----------- |---------- |
 GetSystemTimeAsFileTime | 14.9161 ns | 1.0890 ns |
                  UtcNow |  4.9967 ns | 0.2788 ns |
i3arnon
  • 113,022
  • 33
  • 324
  • 344
  • 2
    CLR can call it directly. Pinvoke goes through marshalling layer. – David Heffernan Jun 18 '16 at 15:27
  • @DavidHeffernan even when the parameters don't need marshalling? – i3arnon Jun 18 '16 at 15:28
  • 1
    @i3arnon: Something has to analyze them to prove that. – Ben Voigt Jun 18 '16 at 15:31
  • @BenVoigt where does that layer come in? Can I avoid it somehow? – i3arnon Jun 18 '16 at 15:33
  • C++/CLI emits assemblies using the `internalcall` calling convention just like the CLR implementation, which avoids p/invoke overhead by assuming that the callee is aware of .NET memory layout and will take care of things. – Ben Voigt Jun 18 '16 at 15:37
  • The one thing you might try is `unsafe` using a pointer, instead of an `out` parameter. With a pointer, *your* code is responsible for performing pinning, and you can outright skip it for a stack variable. – Ben Voigt Jun 18 '16 at 15:41
  • @BenVoigt tried it. It had no effect: https://gist.github.com/i3arnon/fc61ba3ef9553e0e048eb8d14aaa5dc2 – i3arnon Jun 18 '16 at 15:55
  • 3
    Microsoft has documented as part of the CoreCLR project what it takes to write unmanaged code that can be directly called from a managed program. Details are very important, it is far too easy to create a "GC hole". The kind of problem that the pinvoke marshaller solves for you, at the cost of some overhead. You have to understand everything that [this article](https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-code-guide.md#2.1) says. – Hans Passant Jun 18 '16 at 15:58

2 Answers2

8

When managed code invokes unmanaged code there's a stack walk making sure the calling code has the UnmanagedCode permission enabling doing that.

That stack walk is done at run-time and has substantial costs in performance.

It's possible to remove the run-time check (there's still a JIT compile-time one) by using the SuppressUnmanagedCodeSecurity attribute:

[SuppressUnmanagedCodeSecurity]
[DllImport("kernel32.dll")]
public static extern void GetSystemTimeAsFileTime(out long systemTimeAsFileTime);

This brings my implementation about half the way towards the CLR's:

                  Method |    Median |    StdDev |
------------------------ |---------- |---------- |
 GetSystemTimeAsFileTime | 9.0569 ns | 0.7950 ns |
                  UtcNow | 5.0191 ns | 0.2682 ns |

Keep in mind though that doing that may be extremely risky security-wise.

Also using unsafe as Ben Voigt suggested brings it halfway again:

                  Method |    Median |    StdDev |
------------------------ |---------- |---------- |
 GetSystemTimeAsFileTime | 6.9114 ns | 0.5432 ns |
                  UtcNow | 5.0226 ns | 0.0906 ns |
i3arnon
  • 113,022
  • 33
  • 324
  • 344
  • Thanks, the combination of `SuppressUnmanagedCodeSecurity` with passing a pointer (`unsafe`) is a real winner. On my system it makes the call twice as fast as `DateTime.UtcNow` (2.3 ns vs 5.3 ns) in 64-bit mode, or roughly four times as fast as only suppressing the stack walk (8.9 ns). Strangely, the difference is less pronounced in 32-bit mode where the difference is 3.2 ns vs 4.3 ns (yes, `DateTime.UtcNow` is faster under WOW64 than in native 64-bit mode). Kudos for coming up with this winning combo! – DarthGizka May 06 '20 at 16:33
7

The CLR almost certainly passes a pointer to a local (automatic, stack) variable to receive the result. The stack doesn't get compacted or relocated, so there's no need to pin memory, etc, and when using a native compiler, such things aren't supported anyway so there's no overhead to account for them.

In C# though, the p/invoke declaration is compatible with passing a member of a managed class instance living in the garbage-collected heap. P/invoke has to pin that instance or else risk having the output buffer move during/before the OS function writes to it. Even though you do pass a variable stored on the stack, p/invoke still must test and see whether the pointer is into the garbage collected heap before it can branch around the pinning code, so there's non-zero overhead even for the identical case.

It's possible that you could get better results using

[DllImport("kernel32.dll")]
public unsafe static extern void GetSystemTimeAsFileTime(long* pSystemTimeAsFileTime);

By eliminating the out parameter, p/invoke no longer has to deal with aliasing and heap compaction, that's now completely the responsibility of your code that sets up the pointer.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720