It seems that Intel's Transactional Synchronization Extensions (TSX-NI) work on a per-CPU basis.
This applies to both the _InterlockedXxx_HLE{Acquire,Release}
Hardware Lock Elision functions (HLE), as well as for the _xbegin
/_xend
/etc. Restricted Transactional Memory (RTM) functions.
What is the "proper" way to use these functions on multi-core systems?
Given their correctness guarantees, I assume I only need to be worried about performance here.
So, how should I structure & write my code so that my code has the best performance, considering that there is always the chance that threads might suddenly switch cores and hence these instructions might need to fall back to slower code paths?
For example, should I try to set thread CPU affinities explicitly, or is that bad practice?
Is there any other thing I should worry about?