Confusion in Multithreading in C++

Question

I am trying to simulate a probability problem, in which there are n clients and n servers. Each client randomly sends a request to any server, so each server can receive any number of requests, I have to calculate the expected number of maximum requests that any server can receive.

I am trying to simulate this by running 10,000 iterations, where in each iteration each client chooses a random server and a request is sent to it, servers are represented as an integer array of size N.

Client chooses a random number and then the value at that index in server array is incremented. As, for better results the question says N should be about 10⁶.

So to make it a little faster , I used multithreading in which each thread runs 100 iterations and in total there are 10 threads.

But the multithreaded code produces very different results as that from normal code. Below are the code snippets with output for both of them

Normal Version

 #include <iostream>
 #include <random>
 #include <chrono>

 #define N 1000000
 #define iterations 1000

int servers[N];

// This array's i'th index will contain count of in how many
// iterations was i the maximum number of requests received by any  server
int distr[N+1]={0};

int main(int argc, char const *argv[])
{   
   // Initialising
   auto start = std::chrono::high_resolution_clock::now();

   std::srand(time(NULL));

   // Performing iterations
   for(int itr=1; itr<=iterations; itr++)
   {
       for(int i=0;i<N;i++)
       {
           servers[i]=0;
       }

       for(int i=1;i<=N;i++)
       {
           int index = std::rand()%N;
           servers[index]++;
       }

       int maxRes = -1;
       for(int i=0;i<N;i++)
       {
           maxRes = std::max(maxRes, servers[i]);
       }
       distr[maxRes]+=1;
   }

   for(int i=0;i<=15;i++)
   {
      std::cout<<(double)distr[i]<<std::endl;
   }

   auto stop = std::chrono::high_resolution_clock::now();
   auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
   std::cout<<duration.count()<<" milliseconds"<<std::endl;

   return 0;
}

Output

0
0
0
0
0
0
0
359
552
79
10
0
0
0
0
0
1730 milliseconds

Multithreaded Version

#include <iostream>
#include <random>
#include <chrono>
#include <thread>
#include <fstream>

#define N 100000
#define iterations 1000
#define threads 10

// This array's i'th index will contain count of in how many
// iterations was i the maximum number of requests received by any server
std::atomic<int> distr[N] = {};

void execute(int number)
{
    // Performing iterations
    int servers[N]={0};
    for(int itr=1; itr<=number; itr++)
    {

        for(int i=1;i<=N;i++)
        {
            int index = std::rand()%N;
            servers[index]++;
        }

        int maxRes = -1;
        for(int i=0;i<N;i++)
        {
            maxRes = std::max(maxRes, servers[i]);
            servers[i]=0;
        }

        distr[maxRes] += 1;
    }
}

int main(int argc, char const *argv[])
{   
    // Initialising
    auto start = std::chrono::high_resolution_clock::now();

    std::srand(time(NULL));

    std::thread t[threads];
    for(int i=0;i<threads;i++)
    {
        t[i] = std::thread(execute, iterations/threads);
    }   

    for(int i=0;i<threads;i++)
    {
        t[i].join();
    }

    for(int i=0;i<=15;i++)
    {
        double temp = (double)distr[i];
        std::cout<<i<<"\t"<<distr[i]<<std::endl;
    }

    auto stop = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    std::cout<<duration.count()<<" milliseconds"<<std::endl;

    return 0;
}

Output

0   0
1   0
2   0
3   0
4   0
5   0
6   0
7   7
8   201
9   421
10  267
11  68
12  2
13  2
14  4
15  0

1385 milliseconds

Whereas I have run the normal code many times and each times the count for maximum = 9 > 500, and there is not so much scattering of data, I mean only maximum = 8,9,10,11 have significant values rest all are zeroes.

Can anyone please explain what am I doing wrong ?

Thanks in advance!

If you're programming in C++ then please don't add other language tags. — Some programmer dude, Aug 27 '18 at 10:56
@MohammadrezaPanahi But in which part can there be data race ? For distr array I am already using std::atomic ? — Vaibhav Thakkar, Aug 27 '18 at 10:57
Maybe the problem you try to solve isn't as parallelizeable as you think? If you halve the number of threads, or double it, does the result differ much? Oh, and how many cores do your CPU have? More threads than cores could make it slower. — Some programmer dude, Aug 27 '18 at 11:01
@Someprogrammerdude I tried to reduce the threads into half making them 5 but that resulted in more time than the one with 10 threads, not very much but a difference of few seconds. — Vaibhav Thakkar, Aug 27 '18 at 11:07
Oh by the way, it's implementation defined if [`std::rand`](https://en.cppreference.com/w/cpp/numeric/random/rand) is thread-safe or not. Better not use it at all, and instead use [other standard PRNG facilities](https://en.cppreference.com/w/cpp/numeric/random) (like e.g. [`std::uniform_int_distribution`](https://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution)). — Some programmer dude, Aug 27 '18 at 11:15
@Someprogrammerdude Thanks! I just wanted to ask will there be any problem if I declare the uniform_int_distribution object as global so that all threads could access it, or should I pass it as a parameter to the thread function ? — Vaibhav Thakkar, Aug 27 '18 at 12:09
Define it (and its support objects) as a local variable inside each thread function. — Some programmer dude, Aug 27 '18 at 12:22
It did changed the output but now for large inputs I am facing segmentation fault for large size of server array because the stack size limit for threads is very low in MacOS .Any suggestions for that ? — Vaibhav Thakkar, Aug 27 '18 at 12:42

einpoklum · Accepted Answer · 2018-08-27T11:41:00.543

1

I don't see "very different results", they're just somewhat different, so it seems it's something a bit subtle. I've noticed you're not seeding each thread separately - that might have something to do with it.

PS: You shouldn't use rand() % N if you want a uniform distribution. Why? See this explanation by Stephen Lavaveij. As commenters suggest, the skew may be small when N is small, but still.

edited Aug 27 '18 at 11:41

answered Aug 27 '18 at 11:06

einpoklum

118,144
57
340
684

*never use rand() % N if you want a uniform distribution!* That's more of a comment than an answer. If `N` is much less than `RAND_MAX`, any skew resulting from `RAND_MAX % N` not being zero will be negligible for anything but uses such as actual cryptography. Someone capable of successfully writing that type of code would already know not to do that. *You're not seeding each thread separately* Why? Seeding `std::rand()` more than once would likely be wrong. – Andrew Henle Aug 27 '18 at 11:27
@hellow: Fixed. – einpoklum Aug 27 '18 at 11:43
@AndrewHenle: You're right, but while `N` may be slow in OP's toy example, it may change, and it may not be low for other people reading this question. – einpoklum Aug 27 '18 at 11:43

Confusion in Multithreading in C++

1 Answers1