0

I am writing a zero-latency cloud gaming server. It's a software pipeline. In the first stage we capture the screen and in the second stage we encode it to the video.

However after some amount of time the second stage freezes. I tried many platform-indepandent approach, but in vein, either of them will freeze eventually. The answer of How to prevent threads from starvation in C++11 stated that we should use mutex. I tried it. And it can last longer but it still freeze sometimes(rarely). I think mutex is not a explicit hint to prevent thread starvation too. (maybe I am doing wrong?)

Now I use mutex and disable Windows priority boost feature at the same time, but I don't like this solution at all. could anyone provide an example of starvation-free producer and consumer(better in C++11)?

producer:

while(Streamer.IsConnected()) {
    uint8_t *pBits = Streamer.AcquireNext();
    // The buffer is full
    if(pBits && get_counter(&fps_limiter) >= 1000 / args.m_MaxFps && check_target_window(args.m_TargetWindow.c_str(), limit, &rect)) {
        BROFILER_FRAME("MainLoop")
        start_counter(&fps_limiter);
        if(!FAILED(capture_screen(g_DXGIManager, rect, pBits)))
            Streamer.PushNext();
    }
    else {
        this_thread::yield();
        // lower cpu usage
        Sleep(1);
        continue;
    }

    if (get_counter(&bit_rate) >= 1000) {
        uint32_t bps = Streamer.GetBitRate();
        printf("\rBirate: %u bps, %u Bps\t\t\t\t\t", bps, bps/8);
        start_counter(&bit_rate);
    }
}

consumer:

    while(!m_ServerShouldStop) {
        uint8_t *data = AcquireLast();
        if (!data) {
            this_thread::yield();
            Sleep(1);
            continue;
        }
        // encoder callback
        uint8_t *out;
        uint32_t size = m_Encoder(data, &out);

        PopLast();

        // If encoder output something, send it immediately
        if(size>0) {
            // send the size of buffer
            int res1 = ::send_whole_buffer(client_sck, reinterpret_cast<uint8_t *>(&size),
                sizeof(size));
            // then the contents
            int res2 = ::send_whole_buffer(client_sck, out, size);

            bytes += size;

            if (m_EventHandler)
                m_EventHandler->onFrameSent();

            // If any of them fails....
            if(!res1||!res2)
                break;
        }
        if (get_counter(&counter) >= 1000) {
            m_Bps = bytes * 8;
            bytes = 0;
            start_counter(&counter);
        }

    }
...

Initially I did not do any protection to circular queue. I think there no race condition(one producer and one consumer). Then I try to add mutex but nothing change....

Community
  • 1
  • 1
Tim Hsu
  • 402
  • 5
  • 19
  • 1
    I would suggest taking stack traces of what the threads are doing. If a thread is sitting in a function like WaitForSingleObject it should give you a big clue. Do a couple of stack traces in a row to be sure. – Steve Oct 31 '16 at 04:57
  • I use a profiler called brofiler(available on github). It will hook Windows APiI. I didn't explicitly call WaitForSingleObject. but it seems that the thread will get into it. (But it will return back) – Tim Hsu Oct 31 '16 at 06:52
  • 3
    Hmm, no, if it *really* was a starvation problem then removing the boost feature is exactly the wrong thing to do. You are hiding a deadlock problem, very bad idea. Producer-consumer locks are readily available from Boost and the winapi (InitializeSRWLock etc), don't write your own. – Hans Passant Oct 31 '16 at 06:56
  • is it available in C++11 lock? – Tim Hsu Oct 31 '16 at 07:04
  • I provide the code above, if anything missing please tell me – Tim Hsu Oct 31 '16 at 07:11
  • Re: "stated that we should use mutex" -- this is obviously wrong. The key to high performance multi-threaded applications (and to multi-threaded applications in general) is proper **design**, not implementation hacking. A mutex is a **tool**, not a solution. "Why does the chair I built collapse when I sit on it?" "Oh, you need a better hammer." – Pete Becker Oct 31 '16 at 13:12
  • There is no point yielding and and Sleeping. Sleep yields execution for at least the specified amount of time. On your actual problem, can you post a stack trace of the app when it's hung? – Steve Nov 01 '16 at 10:29
  • I added an answer below, somewhere else corrupt the stack making the program unstable. But I can't explain why it hang – Tim Hsu Nov 01 '16 at 12:15

2 Answers2

2

The word Freezing implies a race condition rather than thread starvation.

Thread starvation is where all threads concerned are competing for a single mutex and a single thread (or a few of the threads) keep grabbing the mutex leaving the other threads starved. This is an example of a bad application design if you have that much competition for a single Mutex.

However you said Freezing. So freezing implies you have ended up in a Race condition where neither of (two or more) threads can get the mutex or some other constraint in your code.

There isn't enough information in your question to provide any worthwhile answer. Please provide a code sample of exactly what you are doing and exactly what is happening.

tcwicks
  • 495
  • 3
  • 11
  • I don't think race condition occur. because there are only one producer and one consumer.(but I will check it one more time) I pause my program and see where each thread runs to. the "freezed" thread stop at random line. – Tim Hsu Oct 31 '16 at 04:55
  • Hi Tim. The number of possibilities of whats freezing your code is enormous. It could be a race condition with your circular queue (streamer). It could be that a resource issue with your screen capture routing is making the app unstable. it could be that your capture buffer is leaking memory meaning maybe your circular queue is not really circular. Sorry but there are way too many possibilities. – tcwicks Nov 02 '16 at 00:34
  • I know but sorry that I am not allowed to provide full source code here. – Tim Hsu Nov 02 '16 at 03:52
0

I found that my local variable is corrupted by my colleague's function. Making libx264 work incorrectly. Actually the code can be written lock-free. stiil, adding mutex is better than busy waiting. can lower the cpu usage a lot

Tim Hsu
  • 402
  • 5
  • 19
  • 1
    With all due respect, I don't have the impression you are an expert for parallel programming. If you have multiple threads accessing the same data concurrently, do yourself a favor and use locks. – Markus Mayr Oct 31 '16 at 09:06
  • Thanks. I still added a lock to my circular queue later. – Tim Hsu Oct 31 '16 at 09:12