1

I'm trying to call an C function from a .NET application. Indeed I do the following:

public unsafe class Simd
{
    [UnmanagedFunctionPointer(CallingConvention.Winapi)]
    public delegate void MatrixMultiplyDelegate(float* left, float* right);

    public static MatrixMultiplyDelegate MatrixMultiply;

    public static void LoadSimdExtensions()
    {
        string assemblyPath = "Derm.Simd.dll";

                  // Really calls 'LoadLibrary', 'GetProcAddress', 'FreeLibrary' from Kernel32.dll
        IntPtr address = GetProcAddress.GetAddress(assemblyPath, "Matrix4x4_Multiply_SSE");

        if (address != IntPtr.Zero) {
            MatrixMultiply = (MatrixMultiplyDelegate)Marshal.GetDelegateForFunctionPointer(address, typeof(MatrixMultiplyDelegate));
        }
    }
}

The function loaded is declared as follow:

extern "C" {

    void __declspec(dllexport) Matrix4x4_Multiply_SSE(float *left, float *right);

}

Sadly, I get the following exception when calling GetDelegateForFunctionPointer:

InvalidFunctionPointerInDelegate:

Invalid function pointer 0xb81005 was passed into the runtime to be converted to a delegate.

What am I doing wrong?

Luca
  • 11,646
  • 11
  • 70
  • 125

1 Answers1

3

First of all, are you sure you are using __stdcall calling convention?

C# uses __stdcall calling convention by default, if you don't specify any C++ uses __cdecl by default!

extern "C" void __declspec(dllexport) __stdcall Matrix4x4_Multiply_SSE(float *left, float *right);

Second... you cannot use FreeLibrary if you are going to use that method. Load the library once and keep it in memory. You don't need to call FreeLibrary never in reality, the operative system will do it when you unload your program.

Third... are you sure that using SSE multiplication through a delegate call to a P/Invoke function is faster than performing it in pure C#? P/Invoke calls are very expensive!

Take a look at XNA matrix multiplication code with reflector, it is hand written in C# and is faster for single matrices.

If you need to multiply alltogether 10000 matrices then i would suggest you an SSE code in your dll that will perform 10000 multiplication in native ultraoptimized code, but only for a single one multiplication, doing it in C# is faster, without P/Invoke and without any delegate.

Note also that memory for SSE instructions must be aligned in 16 bytes boundary and of course C# don't follow that kind of alignment :) Especially you will have to deal with garbage collector that loves to move memory when needed. You would need to use pinned arrays then or unmanaged memory.

Salvatore Previti
  • 8,956
  • 31
  • 37
  • Thank for response, in few hours I can test the FreeLibrary bug fix. – Luca Oct 22 '11 at 05:33
  • Before wasting the SSE optimization, I'd like to profile it, thank you. I have already managed memory alignment. Just by now I'm using unsafe plain C# code, but since it is a very frequest code I want to give it a try. – Luca Oct 22 '11 at 05:36
  • Sure of course, try it with your hands :) and as I said your approach is perfect in performance terms if you have a lot of multiplications to do (i.e. two big input arrays), but in that case you should write the loop in the c code, so, adding a "count" parameter to your c function. – Salvatore Previti Oct 22 '11 at 12:24
  • yes, the SSE code doesn't give the expected performance boost, but I suppose that at least offload the CPU. Additionally I can concatenate multiple operations on only one P/Invoke: in this case the performance increase a little. Thank you. – Luca Oct 24 '11 at 14:51