Question
In implementing a simple semaphore, I tried to calculate the average time spent in the busy-wait loop and sleeping for that amount of time in nanoseconds. Either way, sleeping for 1 nanosecond seemed to achieve the same result: there is a huge speed-up (about 6x faster) when I call...
std::this_thread::sleep_for(std::chrono::nanoseconds(1));
...in the busy-wait loop of the semaphore's P()
function.
I believe that the sleep is greatly reducing contention, but I'd like someone more knowledgeable to give me a more certain answer.
Using std::this_thread::yield();
is more beneficial in execution time than simply spinlocking, but still has a high CPU usage (90-100%). The sleep method keeps it at 10-15%; threads still interleave, so it isn't that one thread executes all of its instructions in a row.
Code
I've minimized this example as much as possible to highlight the relevant part of code. I've provided a demo link with the full implementation at the end of the question.
#include <atomic>
struct semaphore
{
public:
explicit semaphore( int const max_concurrency );
void P()
{
// why does this sleep improve performance so much?
while ( !try_decrease_count() )
std::this_thread::sleep_for( std::chrono::nanoseconds( 1 ) );
}
void V()
{
count_.fetch_add( 1 );
}
private:
bool try_decrease_count();
std::atomic<int> count_;
};
semaphore::semaphore( int const max_count )
: count_{ max_count }
{}
bool semaphore::try_decrease_count()
{
int old_count{ count_.load() };
do
{
if ( !old_count ) return false;
} while ( !count_.compare_exchange_strong( old_count, old_count - 1 ) );
return true;
}
Demonstrations
These include the full code along with tests.
Demo with sleep_for: http://coliru.stacked-crooked.com/a/ff16411d9a884556
Demo without sleep_for: http://coliru.stacked-crooked.com/a/2df776f0a03425b1