Slow communication using shared memory between user mode and kernel

Question

I am running a thread in the Windows kernel communicating with an application over shared memory. Everything is working fine except the communication is slow due to a Sleep loop. I have been investigating spin locks, mutexes and interlocked but can't really figure this one out. I have also considered Windows events but don't know about the performance of that one. Please advice on what would be a faster solution keeping the communication over shared memory possibly suggesting Windows events.

KERNEL CODE

typedef struct _SHARED_MEMORY
{
    BOOLEAN mutex;
    CHAR data[BUFFER_SIZE];
} SHARED_MEMORY, *PSHARED_MEMORY;

ZwCreateSection(...)
ZwMapViewOfSection(...)

while (TRUE) {
    if (((PSHARED_MEMORY)SharedSection)->mutex == TRUE) {
      //... do work...
      ((PSHARED_MEMORY)SharedSection)->mutex = FALSE;
    }
    KeDelayExecutionThread(KernelMode, FALSE, &PollingInterval);
}

APPLICATION CODE

OpenFileMapping(...)
MapViewOfFile(...)

...

RtlCopyMemory(&SM->data, WriteData, Size);
SM->mutex = TRUE;

while (SM->mutex != FALSE) {
    Sleep(1); // Slow and removing it will cause an infinite loop
}

RtlCopyMemory(ReadData, &SM->data, Size);

UPDATE 1 Currently this is the fastest solution I have come up with:

while(InterlockedCompareExchange(&SM->mutex, FALSE, FALSE));

However I find it funny that you need to do an exchange and that there is no function for only compare.

score 1 · Accepted Answer · answered Mar 06 '19 at 08:30

1

You don't want to use InterlockedCompareExchange. It burns the CPU, saturates core resources that might be needed by another thread sharing that physical core, and can saturate inter-core buses.

You do need to do two things:

1) Write an InterlockedGet function and use it.

2) Prevent the loop from burning CPU resources and from taking the mother of all mispredicted branches when it finally gets unblocked.

For 1, this is known to work on all compilers that support InterlockedCompareExchange, at least last time I checked:

__inline static int InterlockedGet(int *val)
{
    return *((volatile int *)val);
}

For 2, put this as the body of the wait loop:

__asm
{
    rep nop
}

For x86 CPUs, this is specified to solve the resource saturation and branch prediction problems.

Putting it together:

while ((*(volatile int *) &SM->mutex) != FALSE) {
    __asm
    {
        rep nop
    }
}

Change int as needed if it's not appropriate.

answered Mar 06 '19 at 08:30

David Schwartz

179,497
17
214
278

Great answer, thanks! Just a question I noticed you didn't call the inline function but instead just added the volatile pointer, I understand its the same thing but just want to make sure. Also the architecture is 64bit so __asm wont work. What's the equivalent for 64bit? perhaps YieldProcessor()? – illion Mar 06 '19 at 09:01
Yes, you can use the inline function or just splice it in, whatever's more convenient. See [here](https://learn.microsoft.com/en-us/windows/desktop/api/winnt/nf-winnt-yieldprocessor) for information about `YieldProcessor` and its various implementations. One of those should work for you. (The comment is misleading. It will also fix issues with power consumption and branch prediction whether or not hyper-threading is being used. The comment tells you what MS guarantees it will do. Intel and AMD specify that it will do a lot more.) – David Schwartz Mar 06 '19 at 09:04
Awesome thanks, I will accept this now as you have given a perfect answer. I have another question about the kernel code, I was wondering if you had any comments on the wait procedure KeDelayExecutionThread or if I should use KeStallExecutionProcessor to get around 50-100 microseconds pause. Should I open a new question? – illion Mar 06 '19 at 09:15
@illion Probably, since I don't know that much about windows kernel node. – David Schwartz Mar 06 '19 at 15:06

Slow communication using shared memory between user mode and kernel

1 Answers1

Linked