After 5 days working on this (3 major rewrites), I seek help from you wise people on StackOverflow. I’ve used atomic int(s) before, but not this extensively.
THE SCENARIO; I have batches, 1..1000+ batches, of test labeled A..Z. Batches set 2, cant be processed until it knows the test results processed of batch 1 and so on.
1) A B C..Z
2) A B C..Z
.
.
99)
100++) A B C..Z
STAGE3 (step three) – utilizes a global Atomic Int, to assign a local value index (fetch_add), instead of a “Do, while“ loop, looking for the next free “test” (removing the need for loops and looking for locks to be free on each test). It Works beautifully.
void Stage3(int ThreadsLevel)
{
int index;
CurrentWorkers.fetch_add(1); //Workers Entering
index = IndexManager.fetch_add(1); //Assign Thread His Job (A..Z)
do
{
//
// BODY OF ALL THE WORK TO DO
//
index = IndexManager.fetch_add(1);
} while (index < NumberOfTest);
CurrentWorkers.fetch_sub(1); //Workers Exiting
}
STAGE2 (a gate,) stops any straggler threads (windows might borrow and return), from joining in if work almost done, and sends the Thread forward to begin the next batches (free flowing threads), once the final thread is done. I know exactly when the work is done, by using a Atomic Int (CurrentWorkers.fetch_add(1) on entry, and CurentWorkers.fetch_sub(1), in stage 3) on exit.
Uses; IndexManager.Load() < NumberOfText to see if all assigned , just waiting to complete. If so blocks it from stage 3, and sends forward.
Uses; (CurrentWorkers.Load() == 0 )
to Reset; IndexManager.load(0)
void Stage2(int ThreadsLevel)
{
if (ThreadsLevel == CurrentLevel.Load()) // < Less than Straggler Go Wait to do Next batches
{
if (IndexManager.load() < NumberOfTest) //greater send thread to wait for next batch
{
Stage3(ThreadsLevel);
if (CurrentWorkers.load() == 0)
{
if (IndexManager.load() > 0)
IndexManager.store(0);
}
}
}
}
**** I demonstrated all the Atomic Variables in use, in case you think this is a compiler ordering problem or something.
Stage1 THE PROBLEM; When the Test begins, Threads are created and sent to a Waiting room (Stage1_Wait),to eliminate the costly overhead of “Creating” and “Ending Threads” for each iteration of (A…Z). It Waits for “TestReady.load()”.
The MAIN thread goes out and gets the samples (A..Z), sets a Atomic Int (SamplesReady”) to Free the Threads. For now, it waits for all the threads to return (ActiveThreads), before grabbing the next Batch.
void Stage1_Wait() //PROBLEM
{
std::atomic_int testing;
while (ProcessingTest.load())
{
if (TestReady.load())
{
Stage2(ActiveLevel); //ActiveLevel Global Variable passed by value
SamplesReady.store(0);
ActiveThreads.fetch_sub(1);
}
}
}
void MainThread_ProcessLevel()
{
ActiveThreads.store(NumberOfThreads);
SamplesReady.store(1);
int waiting = 1;
while (waiting)
{
if (!ActiveThreads.load())
waiting = 0;
}
}
Works fine with 1 Single thread off the main thread. Add 2 Threads, and it gets over 100 test levels, before “locking up”. Add more threads and it “Locks up” within 20.
Definition LOCKING UP: ActiveThreads.fetch_sub(1) IS NOT decrementing down to zero. Threads are passing it but no updates on the “fetch_sub(1)”. Therefore the MainThreadProcessing is just Waiting. It never reached “0” is stuck.
After using various “memory_order_options” – I said okay, let the threads flow, and had MainThread instead check for “SamplesReady.load() == 0”. GUESS WHAT… “SamplesReady “ isn’t getting updated, BUT NOW “ActiveTHreads” Sees every thread and is decrementing (Feels like I’m being Pranked by my computer – LOL).
So the PROBLEM seems to be MainThreadProcessing and Stage1_Wait. Last night I did a total rewrite/redesign of everything (final is what you see now). It crept up on me again (same area). Any Ideas???
After 3 Days I rewrote/redesigned the code (3 time). I tried various memory ordering options (after scanning StackOverflow and other sites) and still the same results in the same area.
UPDATED / ADDED PER REQUEST - "BAR MIN" I Even removed my Memory Order test, so the default is the strongest memory order. COMMENTS DEMONSTRATE WHERE... FIXED!
#include <iostream>
#include <thread>
#include <atomic>
std::atomic_int ProcessingSamples;
std::atomic_int SamplesReady;
std::atomic_int ActiveThreads;
std::thread ThreadCalls[8];
int NumberOfThreads;
void PretendToProcessSamples()
{
double StupidMath;
StupidMath = 1.0;
StupidMath *= sqrt(StupidMath);
StupidMath += tanh(StupidMath);
}
void WaitingRoom()
{
static int BreakPointInt = 0;
std::atomic_int testing;
BreakPointInt++;
while (ProcessingSamples.load())
{
if (SamplesReady.load())
{
PretendToProcessSamples();
SamplesReady.store(0);
ActiveThreads.fetch_sub(1);
}
}
}
void ProcessSamples(int i) // THREADED
{
int waiting;
ActiveThreads.store(NumberOfThreads);
waiting = 1;
SamplesReady.store(1);
while (waiting)
{
if (!ActiveThreads.load())
waiting = 0;
}
}
void StartBackGroundThreads(int NbrOfThreads)
{
if (NbrOfThreads) // IF 0, do normal operations
{
NumberOfThreads = NbrOfThreads;
ProcessingSamples.store(1);
SamplesReady.store(0);
for (int z = 0; z < NumberOfThreads; z++)
{
ThreadCalls[z] = std::thread(WaitingRoom);
}
}
}
int main()
{
StartBackGroundThreads(4);
//for (int b = 0; b < 1000; b++)
for (int i = 0; i < 4000; i++)
ProcessSamples(i);
ProcessingSamples.store(0);
for (int z = 0; z < NumberOfThreads; z++)
{
ThreadCalls[z].join();
}
}