0

Hi I am having trouble with expected false sharing not occurring from my test code.

I am trying to create a process unique thread manager which manages multiple threads homogeneously.

The unique thread manager class is NOT a thread pool, it operates by assigning task functions to designated thread, and able to get return value of the task function, which is not just pushing tasks to the queue without consideration. Also, the thread manager does not care the size (computation amount) of task.

The thread manager will be used by a thread (main thread) for handling computation parts and it will be used quite frequently. The reason for this is, my process will be having game loop design pattern and I want to make the game loop over 120 FPS, which means 1 game loop must be done in less than 8.3 millisecond. A thread (main thread) might this task assignment for a number of times within 1 game loop, so reducing/eliminating context switching cost was my primary concern. My conclusion was having the thread manager's threads spinlock.

In short, the game loop will be iterating following two steps for a number of times.

  1. Main loop assigns tasks to the thread manager.
  2. Wait for results of tasks by the thread manager.

Below is my test code.

ThreadManager.h

namespace YSLibrary
{
    class CThreadManager final
    {
    private:

        static long long s_llLock;

        static unsigned long long s_ullThreadCount;
        static void** s_ppThreads;
        static unsigned long* s_pThreadIDs;
        static long long* s_pThreadQuits;

        static long long* s_pTaskLocks;
        static unsigned long long (**s_ppTasks)();
        static unsigned long long* s_pTaskResults;

        CThreadManager(){}
        ~CThreadManager(){}

        __forceinline static void Lock()
        {
            while (true)
            {
                if (InterlockedCompareExchange64(&s_llLock, 1LL, 0LL) == 0LL)
                {
                    return;
                }

                Sleep(0UL);
            }
        }

        __forceinline static void Unlock()
        {
            InterlockedExchange64(&s_llLock, 0LL);
        }

        static unsigned long __stdcall Thread(void* const _pParameter)
        {
            const unsigned long long ullThreadIndex = reinterpret_cast<const unsigned long long>(_pParameter);

            while (true)
            {
                if (InterlockedCompareExchange64(&s_pThreadQuits[ullThreadIndex], 0LL, 1LL) == 1LL)
                {
                    return 1UL;
                }

                if (InterlockedCompareExchange64(&s_pTaskLocks[ullThreadIndex], 1LL, 0LL) == 0LL)
                {
                    if (s_ppTasks[ullThreadIndex] != nullptr)
                    {
                        s_pTaskResults[ullThreadIndex] = s_ppTasks[ullThreadIndex]();
                        s_ppTasks[ullThreadIndex] = nullptr;
                    }

                    InterlockedExchange64(&s_pTaskLocks[ullThreadIndex], 0LL);
                }
            }
        }

    public:

        enum class EResult : unsigned long long
        {
            None = 0ULL,
            Success = 1ULL,
            Fail_ArgumentNull = 2ULL,
            Fail_ArgumentInvalid = 3ULL,
            Fail_Locked = 4ULL,
            Fail_ThreadCountNotZero = 5ULL,
            Fail_ThreadCountZero = 6ULL,
            Fail_ThreadsNotNull = 7ULL,
            Fail_ThreadsNull = 8ULL,
            Fail_ThreadIDsNotNull = 9ULL,
            Fail_ThreadIDsNull = 10ULL,
            Fail_ThreadQuitsNotNull = 11ULL,
            Fail_ThreadQuitsNull = 12ULL,
            Fail_TaskLocksNotNull = 13ULL,
            Fail_TaskLocksNull = 14ULL,
            Fail_TasksNotNull = 15ULL,
            Fail_TasksNull = 16ULL,
            Fail_TaskResultsNotNull = 17ULL,
            Fail_TaskResultsNull = 18ULL,
            Fail_CreateThread = 19ULL
        };

        __forceinline static EResult Initialize(const unsigned long long _ullThreadCount)
        {
            if (_ullThreadCount == 0ULL)
            {
                return EResult::Fail_ArgumentNull;
            }

            Lock();

            if (s_ullThreadCount != 0ULL)
            {
                Unlock();
                return EResult::Fail_ThreadCountNotZero;
            }

            if (s_ppThreads != nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadsNotNull;
            }

            if (s_pThreadIDs != nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadIDsNotNull;
            }

            if (s_pThreadQuits != nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadQuitsNotNull;
            }

            if (s_pTaskLocks != nullptr)
            {
                Unlock();
                return EResult::Fail_TaskLocksNotNull;
            }

            if (s_ppTasks != nullptr)
            {
                Unlock();
                return EResult::Fail_TasksNotNull;
            }

            if (s_pTaskResults != nullptr)
            {
                Unlock();
                return EResult::Fail_TaskResultsNotNull;
            }

            s_ullThreadCount = _ullThreadCount;
            s_ppThreads = new void*[s_ullThreadCount]{};
            s_pThreadIDs = new unsigned long[s_ullThreadCount]{};
            s_pThreadQuits = new long long[s_ullThreadCount]{};

            s_pTaskLocks = new long long[s_ullThreadCount]{};
            s_ppTasks = new (unsigned long long (*[s_ullThreadCount])()){};
            s_pTaskResults = new unsigned long long[s_ullThreadCount]{};

            for (unsigned long long i = 0ULL; i < s_ullThreadCount; ++i)
            {
                s_ppThreads[i] = CreateThread(nullptr, 0ULL, &Thread, reinterpret_cast<void*>(i), 0UL, &s_pThreadIDs[i]);
                if (s_ppThreads[i] == nullptr)
                {
                    // Rollback
                    for (unsigned long long j = 0ULL; j < i; ++j)
                    {
                        InterlockedExchange64(&s_pThreadQuits[i], 1LL);
                    }

                    unsigned long ulExitCode = 0UL;
                    for (unsigned long long j = 0ULL; j < i; ++j)
                    {
                        while (true)
                        {
                            GetExitCodeThread(s_ppThreads[j], &ulExitCode);
                            if (ulExitCode != static_cast<unsigned long>(STILL_ACTIVE))
                            {
                                CloseHandle(s_ppThreads[j]);
                                s_ppThreads[j] = nullptr;
                                break;
                            }

                            Sleep(0UL);
                        }
                    }

                    delete[] s_pTaskResults;
                    s_pTaskResults = nullptr;

                    delete[] s_ppTasks;
                    s_ppTasks = nullptr;

                    delete[] s_pTaskLocks;
                    s_pTaskLocks = nullptr;

                    delete[] s_pThreadQuits;
                    s_pThreadQuits = nullptr;

                    delete[] s_pThreadIDs;
                    s_pThreadIDs = nullptr;

                    delete[] s_ppThreads;
                    s_ppThreads = nullptr;

                    s_ullThreadCount = 0ULL;

                    Unlock();
                    return EResult::Fail_CreateThread;
                }
            }

            Unlock();
            return EResult::Success;
        }

        __forceinline static EResult Terminate()
        {
            Lock();

            if (s_ullThreadCount == 0ULL)
            {
                Unlock();
                return EResult::Fail_ThreadCountZero;
            }

            if (s_ppThreads == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadsNull;
            }

            if (s_pThreadIDs == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadIDsNull;
            }

            if (s_pThreadQuits == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadQuitsNull;
            }

            if (s_pTaskLocks == nullptr)
            {
                Unlock();
                return EResult::Fail_TaskLocksNull;
            }

            if (s_ppTasks == nullptr)
            {
                Unlock();
                return EResult::Fail_TasksNull;
            }

            if (s_pTaskResults == nullptr)
            {
                Unlock();
                return EResult::Fail_TaskResultsNull;
            }

            for (unsigned long long i = 0ULL; i < s_ullThreadCount; ++i)
            {
                InterlockedExchange64(&s_pThreadQuits[i], 1LL);
            }

            unsigned long ulExitCode = 0UL;
            for (unsigned long long i = 0ULL; i < s_ullThreadCount; ++i)
            {
                while (true)
                {
                    GetExitCodeThread(s_ppThreads[i], &ulExitCode);
                    if (ulExitCode != static_cast<unsigned long>(STILL_ACTIVE))
                    {
                        CloseHandle(s_ppThreads[i]);
                        s_ppThreads[i] = nullptr;
                        break;
                    }

                    Sleep(0UL);
                }
            }

            delete[] s_pTaskResults;
            s_pTaskResults = nullptr;

            delete[] s_ppTasks;
            s_ppTasks = nullptr;

            delete[] s_pTaskLocks;
            s_pTaskLocks = nullptr;

            delete[] s_pThreadQuits;
            s_pThreadQuits = nullptr;

            delete[] s_pThreadIDs;
            s_pThreadIDs = nullptr;

            delete[] s_ppThreads;
            s_ppThreads = nullptr;

            s_ullThreadCount = 0ULL;

            Unlock();
            return EResult::Success;
        }

        __forceinline static EResult Execute(const unsigned long long _ullThreadIndex, unsigned long long (*_pFunction)())
        {
            if (_pFunction == nullptr)
            {
                return EResult::Fail_ArgumentNull;
            }

            Lock();

            if (s_ullThreadCount == 0ULL)
            {
                Unlock();
                return EResult::Fail_ThreadCountZero;
            }

            if (s_ppThreads == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadsNull;
            }

            if (s_pThreadIDs == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadIDsNull;
            }

            if (s_pThreadQuits == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadQuitsNull;
            }

            if (s_pTaskLocks == nullptr)
            {
                Unlock();
                return EResult::Fail_TaskLocksNull;
            }

            if (s_ppTasks == nullptr)
            {
                Unlock();
                return EResult::Fail_TasksNull;
            }

            if (s_pTaskResults == nullptr)
            {
                Unlock();
                return EResult::Fail_TaskResultsNull;
            }

            if (_ullThreadIndex >= s_ullThreadCount)
            {
                Unlock();
                return EResult::Fail_ArgumentInvalid;
            }

            while (true)
            {
                if (InterlockedCompareExchange64(&s_pTaskLocks[_ullThreadIndex], 1LL, 0LL) == 0LL)
                {
                    s_ppTasks[_ullThreadIndex] = _pFunction;

                    InterlockedExchange64(&s_pTaskLocks[_ullThreadIndex], 0LL);
                    Unlock();
                    return EResult::Success;
                }

                Sleep(0UL);
            }
        }

        __forceinline static EResult WaitForResult(const unsigned long long _ullThreadIndex, unsigned long long* const _pFunctionResult)
        {
            if (_pFunctionResult == nullptr)
            {
                return EResult::Fail_ArgumentNull;
            }

            Lock();

            if (s_ullThreadCount == 0ULL)
            {
                Unlock();
                return EResult::Fail_ThreadCountZero;
            }

            if (s_ppThreads == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadsNull;
            }

            if (s_pThreadIDs == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadIDsNull;
            }

            if (s_pThreadQuits == nullptr)
            {
                Unlock();
                return EResult::Fail_ThreadQuitsNull;
            }

            if (s_pTaskLocks == nullptr)
            {
                Unlock();
                return EResult::Fail_TaskLocksNull;
            }

            if (s_ppTasks == nullptr)
            {
                Unlock();
                return EResult::Fail_TasksNull;
            }

            if (s_pTaskResults == nullptr)
            {
                Unlock();
                return EResult::Fail_TaskResultsNull;
            }

            if (_ullThreadIndex >= s_ullThreadCount)
            {
                Unlock();
                return EResult::Fail_ArgumentInvalid;
            }

            while (true)
            {
                if (InterlockedCompareExchange64(&s_pTaskLocks[_ullThreadIndex], 1LL, 0LL) == 0LL)
                {
                    if (s_ppTasks[_ullThreadIndex] == nullptr)
                    {
                        (*_pFunctionResult) = s_pTaskResults[_ullThreadIndex];

                        InterlockedExchange64(&s_pTaskLocks[_ullThreadIndex], 0LL);
                        Unlock();
                        return EResult::Success;
                    }

                    InterlockedExchange64(&s_pTaskLocks[_ullThreadIndex], 0LL);
                }

                Sleep(0UL);
            }
        }
    };
}

main.cpp

#include <iostream>
#include <Windows.h>
#include "ThreadManager.h"

long long YSLibrary::CThreadManager::s_llLock = 0LL;
unsigned long long YSLibrary::CThreadManager::s_ullThreadCount = 0ULL;
void** YSLibrary::CThreadManager::s_ppThreads = nullptr;
unsigned long* YSLibrary::CThreadManager::s_pThreadIDs = nullptr;
long long* YSLibrary::CThreadManager::s_pThreadQuits = nullptr;
long long* YSLibrary::CThreadManager::s_pTaskLocks = nullptr;
unsigned long long (**YSLibrary::CThreadManager::s_ppTasks)() = nullptr;
unsigned long long* YSLibrary::CThreadManager::s_pTaskResults = nullptr;

unsigned long long g_pResults[10]{};

struct SData
{
    unsigned long long ullData[8];
};

SData g_stData{};

SData g_stData0{};
SData g_stData1{};
SData g_stData2{};
SData g_stData3{};
SData g_stData4{};
SData g_stData5{};
SData g_stData6{};

unsigned long long Function()
{
    for (unsigned long long i = 0ULL; i < 70000000ULL; ++i)
    {
        g_stData.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

unsigned long long Function0()
{
    for (unsigned long long i = 0ULL; i < 10000000ULL; ++i)
    {
        g_stData0.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

unsigned long long Function1()
{
    for (unsigned long long i = 0ULL; i < 10000000ULL; ++i)
    {
        g_stData1.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

unsigned long long Function2()
{
    for (unsigned long long i = 0ULL; i < 10000000ULL; ++i)
    {
        g_stData2.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

unsigned long long Function3()
{
    for (unsigned long long i = 0ULL; i < 10000000ULL; ++i)
    {
        g_stData3.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

unsigned long long Function4()
{
    for (unsigned long long i = 0ULL; i < 10000000ULL; ++i)
    {
        g_stData4.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

unsigned long long Function5()
{
    for (unsigned long long i = 0ULL; i < 10000000ULL; ++i)
    {
        g_stData5.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

unsigned long long Function6()
{
    for (unsigned long long i = 0ULL; i < 10000000ULL; ++i)
    {
        g_stData6.ullData[0] = static_cast<unsigned long long>(rand());
    }

    return 1ULL;
}

int main()
{
    unsigned long long ullStartTick = 0ULL;
    unsigned long long ullEndTick = 0ULL;

    srand((unsigned int)time(nullptr));

    ullStartTick = GetTickCount64();

    Function();

    ullEndTick = GetTickCount64();

    std::wcout << L"[Main]" << std::endl;
    std::wcout << ullEndTick - ullStartTick << std::endl;

    YSLibrary::CThreadManager::EResult eResult = YSLibrary::CThreadManager::EResult::None;

    eResult = YSLibrary::CThreadManager::Initialize(7ULL);

    ullStartTick = GetTickCount64();

    eResult = YSLibrary::CThreadManager::Execute(0ULL, &Function0);
    eResult = YSLibrary::CThreadManager::Execute(1ULL, &Function1);
    eResult = YSLibrary::CThreadManager::Execute(2ULL, &Function2);
    eResult = YSLibrary::CThreadManager::Execute(3ULL, &Function3);
    eResult = YSLibrary::CThreadManager::Execute(4ULL, &Function4);
    eResult = YSLibrary::CThreadManager::Execute(5ULL, &Function5);
    eResult = YSLibrary::CThreadManager::Execute(6ULL, &Function6);
    eResult = YSLibrary::CThreadManager::WaitForResult(0ULL, &g_pResults[0]);
    eResult = YSLibrary::CThreadManager::WaitForResult(1ULL, &g_pResults[1]);
    eResult = YSLibrary::CThreadManager::WaitForResult(2ULL, &g_pResults[2]);
    eResult = YSLibrary::CThreadManager::WaitForResult(3ULL, &g_pResults[3]);
    eResult = YSLibrary::CThreadManager::WaitForResult(4ULL, &g_pResults[4]);
    eResult = YSLibrary::CThreadManager::WaitForResult(5ULL, &g_pResults[5]);
    eResult = YSLibrary::CThreadManager::WaitForResult(6ULL, &g_pResults[6]);

    ullEndTick = GetTickCount64();

    std::wcout << L"[Thread Manager]" << std::endl;
    std::wcout << ullEndTick - ullStartTick << std::endl;

    YSLibrary::CThreadManager::Terminate();

    system("pause");

    return 0;
}

I am really sorry about Interlocked family of functions, __forceinline, dirty declaration of static variables, etc.

On the other hand, the reason why I used "long long" for lock variable is there was no "bool" type. I'd rather tried "short" but it had no significant difference when I measured time between "short" and "long long". Rather, "short" was slightly slower and I guess the reason is the use of 16 bit registers in 64 bit environment. Also, bool or short type might lead to a problem of memory alignment. So I used "long long" type.

The reason why CThreadManager has private constructor is to explicitly prohibit "new CThreadManager()".

The use of "reinterpret_cast" is minimized. I thought it's cost is compile time, but I saw a question from stackoverflow that it has runtime cost. I'm not sure about it yet. So just use it once when thread function begins.

So far, I have checked false sharing phenomenon by changing

SData::ullData[8] -> SData::ullData1

Also, use of Sleep(0) significantly reduced waste of thread time slice in WaitForResult() and reduction of total execution time within threads.

The result of this code showed

[Main]
1828
[Thread Manager]
344

in my environment.

However, I just realized that there was another place other than SData::ullData where false sharing must occur, which are s_pThreadQuits, s_pTaskLocks, s_ppTasks, s_pTaskResults.

Why false sharing is not occurring with these variables?

[EDIT]

What I mean by "false sharing" is "memory address accessed by different threads but share the same cache-line" are

  1. SData g_stDataN (in each FunctionN())
  2. s_pThreadQuits, s_pTaskLocks, s_pTaskResults, s_ppTasks (in Thread())

I thought of 2. variables will also loaded to cache just like g_stDataN (64 byte in my environment) did. I've set the size of SData to 64 bytes in order to achieve the result of "padding" method to avoid false sharing.

However, as far as s_pThreadQuits are neither sized to 64 bytes nor padded, it should also have false sharing.

Like this image below.

enter image description here

Source of image is from https://www.codeproject.com/Articles/85356/Avoiding-and-Identifying-False-Sharing-Among-Threa

YoonSeok OH
  • 647
  • 2
  • 7
  • 15

0 Answers0