0

This is the follow-up of the question How is std::atomic<T>::notify_all ordered?

What would be the answer to that question, if I use WaitOnAddress or futex directly.


From the answer to that question the conclusion is that the below program does not hang, and C++ provides necessary guarantees, although wording still may raise questions:

#include <atomic>
#include <chrono>
#include <thread>

int main()
{
    std::atomic<bool> go{ false };

    std::thread thd([&go] {
        go.wait(false, std::memory_order_relaxed); // (1)
    });

    std::this_thread::sleep_for(std::chrono::milliseconds(400));

    go.store(true, std::memory_order_relaxed); // (2)
    go.notify_all();                           // (3)

    thd.join();

    return 0;
}

Now let's consider translation of this program to pure Windows API, without C++20 or even C++11:

#include <Windows.h>

#pragma comment(lib, "Synchronization.lib")

volatile DWORD go = 0;

DWORD CALLBACK ThreadProc(LPVOID)
{
    DWORD zero = 0;
    while (go == zero)
    {
        WaitOnAddress(&go, &zero, sizeof(DWORD), INFINITE); // (1)
    }
    return 0;
}

int main()
{
    HANDLE thread = CreateThread(NULL, 0, ThreadProc, NULL, 0, NULL);

    if (thread == 0)
    {
        return 1;
    }

    Sleep(400);

    go = 1;                // (2)
    WakeByAddressAll(&go); // (3)

    WaitForSingleObject(thread, INFINITE);
    CloseHandle(thread);

    return 0;
}

Due to spurious wakeups, I've added the loop.

So same question here. If (2) and (3) are observed in reverse order in (1), the may hang due to lost notification. Does WinAPI prevent that, or need to put fences explicitly.


The practical application of the answer to this questions is an implementation of std::atomic<T>::wait by a standard library or a substitute of it on Windows platform.

I have also same question about futex, in context of Linux platform implementation.

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
  • 2
    You might have seen this already, but make sure you google for "wakebyaddress oldnewthing". Raymond Chen has a series of posts on usage of this facility (e.g. this [one](https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185)). – Christian.K Jun 11 '20 at 07:22
  • @Christian.K, I saw them some time ago, but thanks for the reminder anyway. The one you linked contains an interesting reminder about not taking undocumented implementation details. That's why I ask this question. Basically, debugger shows that the functions do have all fences needed, so the program will not hang, but I'm considering avoiding taking dependency on this observation, if it can be broken in future or on other platforms. – Alex Guteniev Jun 11 '20 at 07:40
  • On x86 this is pretty obviously fine; anything `WakeByAddressAll(&go)` plausibly might do will involve a store somewhere in the kernel read by a load on other cores, therefore (by x86's strong memory model) creating release/acquire sync. Any sane design on other ISAs would similarly involve communication between cores, and probably not just `mo_consume`. The possible weakness in the standard's wording is that it arguably *allows* for hypothetical DeathStation 9000 implementations, not that real-life implementations might actually have this problem. – Peter Cordes Jun 11 '20 at 09:20
  • @PeterCordes, `WakeByAddressAll` actually first does **load** to check if there are any waiters. That's why I think the fence I observed is really needed. – Alex Guteniev Jun 11 '20 at 09:23
  • 2
    by fact all windows synchronization api work as full memory barrier . you can for example use `SetEvent` instead `WakeByAddressAll` (and `WaitForSingleObject` instead `WaitOnAddress`) and still ask are `go = 1;` and `SetEvent` can reorderd. are this formal documented..dont know – RbMm Jun 11 '20 at 09:38
  • Thanks for `SetEvent` example. However `SetEvent` is documented in [Synchronization and Multiprocessor Issues](https://learn.microsoft.com/en-us/windows/win32/sync/synchronization-and-multiprocessor-issues#memory-ordering) page. Find "_Functions that signal synchronization objects_" line. Whereas _an address_ does not count as _a synchronization object_. – Alex Guteniev Jun 11 '20 at 09:50
  • @RbMm, also you can see a successful `SetEvent` for auto-reset event as "a release of binary semaphore", and for manual reset event it is a sort of "release of infinte-ary semaphore", so _release_ semantic of `SetEvent` is _a bit more obvious_. – Alex Guteniev Jun 11 '20 at 09:52
  • @AlexGuteniev: But that load isn't the *only* thing it does. Are you worried about racing with a waiter that *just started waiting* as you're notifying? A full barrier (e.g. a seq_cst store to go) might possibly be needed to prevent that, but RbMn says WinAPI calls work as full barriers anyway. – Peter Cordes Jun 11 '20 at 10:00
  • @PeterCordes, yes, I mean this situation. And I see `seq_cst` fence inside (`lock or,[esp],0`). My question is that: if Windows can have this fence removed, and rely on me putting order on my variable. Maybe most of the time they expect notified address to be modified by exchange (which is by itself a proper fence on x86), but I use plain store (for an SPSC queue). – Alex Guteniev Jun 11 '20 at 10:04
  • I don't expect that Windows could remove fences entirely, for exactly the reason you're worried about. Presumably it's there for correctness, and it's expected that software relies on it. (Also note that your 400ms sleep makes the problem basically impossible in your example, though.) – Peter Cordes Jun 11 '20 at 10:11
  • 1
    from your link *The following synchronization functions use the appropriate barriers to ensure memory ordering:.. Functions that signal synchronization objects..Wait functions* and [*Wait functions*](https://learn.microsoft.com/en-us/windows/win32/sync/synchronization-functions#wait-functions) by fact you not need any additional code for prevent reordering. another question formal prove – RbMm Jun 11 '20 at 10:23
  • @RbMm, ok, that I think the answer I'd accept. – Alex Guteniev Jun 11 '20 at 11:09

1 Answers1

1

From Synchronization and Multiprocessor Issues page of Windows documentation:

Memory Ordering

[...]

The following synchronization functions use the appropriate barriers to ensure memory ordering:

  • Functions that enter or leave critical sections
  • Functions that signal synchronization objects
  • Wait functions
  • Interlocked functions

On Synchronization Functions page, Wait functions section lists WaitOnAddress, WakeByAddressAll, WakeByAddressSingle. They are also listed in Wait Functions page as Waiting on an Address.

So all three functions count as wait functions, and thus have appropriate barriers. Though it is not defined what exactly is an appropriate barrier, it seems not possible to imagine any barrier that would not prevent the situation in questions, but otherwise would be somehow "appropriate".

So the situation in question is prevented by these barriers.

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79