0

I'm trying to learn a bit more about the properties about Threads in c++, more accurate: Posix Threads. I constructed a little program that let's you write a total number of string of certain length into a big char* cstr that holds all of the strings, intended to be used in another c program.

I've noticed that I can get a very good result when using 2 Threads to speed the generation up, but anything past that actually slows the program more down than even using 1 thread. Now I'm not sure whether my approach is wrong i.e. using strcpy() to get the strings into my cstr or using more than 2 threads is just not worth it. Could their be risk involved in writing to a global char pointer?

CreateCString() creates the requested number of Threads, assigns them their workspace via the struct, runs the corresponding function and then waits for them to finish.

struct MyArgs {
    long start;
    long end;
};


 void CreateCString() {
     pthread_t threads[NUM_THREADS];
     MyArgs args[sizeof(MyArgs) * NUM_THREADS];

     
     int prev = 0;
     int add = TOTAL/ NUM_THREADS; 
     int prev2 = add;
     for (int i = 0; i < NUM_THREADS; i++)
     {
         args[i].start = prev;
         args[i].end = prev2;
         prev2 += add;
         prev = args[i].end;

         pthread_create(&threads[i], NULL, myfunc, &args[i]);

         cout << "args start von " << i << ": " << args[i].start << endl;
         cout << "args end von " << i << ": " << args[i].end << endl;

     }
     for (int i = 0; i < NUM_THREADS; i++)
     {
         pthread_join(threads[i], NULL);

     }
 }

The called function of all Threads, where the needed parameters are being assigned to the real function.

void* myfunc(void* p) {
    MyArgs* p_arg = (MyArgs*)p;
    int start = p_arg->start;
    int end = p_arg->end;

    addString(p_arg->start, p_arg->end);
    pthread_exit(NULL);
}

The actual meat and potatoes. The function scales to the requested String Length and gives each Thread their own space where they can copy the results into.

void addString(int start, int end) {
     for (int i = start; i < end; i++)
     {
         strcpy((cstr + i * STRING_LENGTH), generateRandomString(STRING_LENGTH).c_str());
     }
 }

Method used for random String Generation

std::string generateRandomString(const int max_length) {
    using namespace std;
    string possible_characters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    random_device rd;
    mt19937 engine(rd());
    uniform_int_distribution<> dist(0, possible_characters.size() - 1);
    string ret = "";
    for (int i = 0; i < max_length; i++) {
        int random_index = dist(engine); //get index between 0 and possible_characters.size()-1
        ret += possible_characters[random_index];
    }
    return ret;
}

Main();

#define NUM_THREADS 2
#define  STRING_LENGTH 8 

int TOTAL;
char* cstr;


int main()
{
    
    cout << "total strings?"; cin >> TOTAl;
    
    cstr = new char[TOTAL * (STRING_LENGTH + 1)];


    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

    CreateCString();
    
    
    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();

    std::cout << "Time elapsed in total = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << "[micros]" << std::endl;
    std::cout << "Time elapsed in total = " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << "[ms]" << std::endl;
    std::cout << "Time elapsed in total = " << std::chrono::duration_cast<std::chrono::seconds>(end - begin).count() << "[s]" << std::endl;
    cout << "----------------------------------------------------------------------" << endl;
    cout << endl;

    delete[] cstr;

}

Is this even the right approach if we are going for relative high speed? How else could we fill cstr with all the wanted strings? Or is it just not worth it to use more than maybe 2 Threads for high numbers? (~million)

Here are some Example times that I'm getting when generating 1000000 Strings using different amount of threads.


Total Strings: 1 000 000

Number of Threads: 1

Total time: 19s


Total Strings: 1 000 000

Number of Threads: 2

Total time: 10s


Total Strings: 1 000 000

Number of Threads: 3

Total time: 9s


Total Strings: 1 000 000

Number of Threads: 4

Total time: 38s (This one makes me scratch my head)


G3cko
  • 23
  • 5
  • 2
    Unrelated: Did you really mean to create 16 * NUM_THREADS instances of `MyArgs`? – Botje Aug 23 '21 at 14:28
  • Just curious, but which version of c++ can you use? If it's c++11 or c++17 then you might consider std::thread (instance of new thread), and std::async (use thread from threadpool if needed) – Pepijn Kramer Aug 23 '21 at 14:32
  • That must've slipped over my head. No I didn't, thanks for clearing that up. NUM_THREADS should suffice. – G3cko Aug 23 '21 at 14:32
  • I'm using C++14 right now, I guess that means I can't use std::async? – G3cko Aug 23 '21 at 14:34
  • Can you edit your question to show the actual numbers you're getting in terms of wall-clock time and bandwidth? Also, what is the implementation of `generate`? Is that accidentally mutating some global shared state behind a lock? – Botje Aug 23 '21 at 14:39
  • I've included some example times. The generate function is just the function I use to generate the random strings. I've also included it's implementation under generateRandomString() in the edit now. – G3cko Aug 23 '21 at 14:59
  • 19 seconds to generate 1 million random strings sounds excruciatingly slow. It took me 1.6 seconds with a quick Perl oneliner. Have you tried moving some code out of `generateRandomString`? Especially the initialization of `rd`, `engine`, and `possible_characters`. – Botje Aug 23 '21 at 15:30
  • @Botje I've moved the 3 inits out of the function now, and the results leave me really confused. Using your method I was able to cut the generation down to about 1 second which is quite incredible. But if I use more than 1 Thread now the generation takes a really long time, like 20 seconds for 2 Threads. So I guess this probably means my implementation of Multithreading is somewhat broken? I – G3cko Aug 23 '21 at 16:09
  • 1
    There's lots of memory allocations (which will typically block in multithreaded apps) behind the scenes in all those string additions. You can reduce the number of allocations by making use of `reserve`, `assign`, or `insert` to preallocate the space for `ret`. You can also rewrite that process to pass in a reusable pre-allocated buffer to `generateRandomString` and eliminate the use of `std::string` entirely. – 1201ProgramAlarm Aug 23 '21 at 16:25
  • @G3cko if contention for the memory is really a problem, that should go away if you use the jemalloc allocator, for example, as that has a memory allocator per thread. – Botje Aug 23 '21 at 18:46
  • Allocations generally does not scale indeed (and are at least not very efficient). Besides this, creating a new random objects for each string is not great either. Moreover, since `STRING_LENGTH` is known, you can use memcpy rather than strcpy. `possible_characters` can be allocated on stack using a plain C buffer, or even better: put in a shared variable filled ahead of time. – Jérôme Richard Aug 23 '21 at 18:46
  • @1201ProgramAlarm I have actually not heard of these functions yet, that seems like something that I have to learn from scratch. Is there maybe a collective term for these functions, so I can look into the idea behind that? – G3cko Aug 23 '21 at 18:59
  • @JérômeRichard Well the point is to make the String_length interchangeable so that doesn't work for me. What's the difference between just declaring the characters on the stack vs using a shared variable here? If I declare characters globally doesn't that make it kind of shared? – G3cko Aug 23 '21 at 19:09
  • The compiler likely write all the array values in the stack which is quite cheap but not free. With a global variable (which is shared by default), the writes are done once and there is no need to fill the array anymore (zero-cost). In the current code, the string will likely be allocated in the heap and then filled before being free which is inefficient (especially if `max_length` is smaller than `possible_characters.size()`). – Jérôme Richard Aug 23 '21 at 19:23

0 Answers0