-2

I did an experiment to simulate what happened in our server code, I started 1024 threads and every thread execute a system call, this takes about 2.8s to finish execution on my machine. Then I add usleep(1000000) in function of every thread, the execution time increase to 16s and time will decrease to 8s when I run same program at second time. I guess this maybe caused by cpu cache and the cpu context switch, but I'm not quite sure how to explain it.

Besides, what is the best practice to avoid this happening (increasing the running time for every threads a little lead to the decreasing for whole program performance).

I attached the test code here, thanks a lot for your help.

//largetest.cc
#include "local.h"
#include <time.h>
#include <thread>
#include <string>
#include "unistd.h"

using namespace std;

#define BILLION 1000000000L

int main()
{

    struct timespec start, end;
    double diff;

    clock_gettime(CLOCK_REALTIME, &start);

    int i = 0;
    int reqNum = 1024;

    for (i = 0; i < reqNum; i++)
    {
        string command = string("echo abc");
        thread{localTaskStart, command}.detach();
    }

    while (1)
    {
        if ((localFinishNum) == reqNum)
        {
            break;
        }
        else
        {
            usleep(1000000);
        }
        printf("curr num %d\n", localFinishNum);
    }

    clock_gettime(CLOCK_REALTIME, &end); /* mark the end time */
    diff = (end.tv_sec - start.tv_sec) * 1.0 + (end.tv_nsec - start.tv_nsec) * 1.0 / BILLION;
    printf("debug for running time = (%lf) second\n", diff);

    return 0;
}
//local.cc
#include "time.h"
#include "stdlib.h"
#include "stdio.h"
#include "local.h"
#include "unistd.h"
#include <string>
#include <mutex>

using namespace std;

mutex testNotifiedNumMtx;
int localFinishNum = 0;

int localTaskStart(string batchPath)
{

    char command[200];

    sprintf(command, "%s", batchPath.data());

    usleep(1000000);

    system(command);

    testNotifiedNumMtx.lock();
    localFinishNum++;
    testNotifiedNumMtx.unlock();

    return 0;
}

//local.h


#ifndef local_h
#define local_h

#include <string>

using namespace std;

int localTaskStart( string batchPath);

extern int localFinishNum;
#endif
wangzhe
  • 573
  • 1
  • 5
  • 13

1 Answers1

0

The read of localFinishNum should also be protected by mutex, otherwise the results are unpredictable based on where (i.e. on which cores) threads get scheduled, when and how the cache gets invalidated, etc.

In fact, the program might not even terminate if you compile it in optimized mode if the compiler decides to put localFinishNum in the register (instead of always loading it from memory).

Grant Miller
  • 27,532
  • 16
  • 147
  • 165