0

I have mmapped a huge file into char string and made a c++ string out of it. I need to parse this string based on a delimit character which is a space character and store the values in matrix. I could do it from one thread but I need to optimize it. So I'm using multiple threads to parse strings from this sstream and store it in matrix . Though based on thread id, I could store the parsed data into matrix synchronously but How do i synchronize the parsing since any thread can get scheduled anytime and parse string. Here is my code

void* parseMappedString(void* args)
{
    char temp[BUFFSIZE];
    long int threadID = *((long int*)args);
    if (threadID  < 0)
        threadID = 0;

    for (int i = ((threadID) * 160); i < ((threadID+1) * 160); i++)
    {
        for (int j = 0; j < 4000; j++)
        {
            pthread_mutex_lock(&ParseMatrixMutex);
            if ((matrix_str.getline(temp,BUFFSIZE, ' ')) )
            {
                pthread_mutex_unlock(&ParseMatrixMutex);
                matrix[i][j] = parseFloat((temp));
            }
            else
            {
                pthread_mutex_unlock(&ParseMatrixMutex);
            }
        }
    }
}

void create_threads_for_parsing(void)
{
    long int i;

    for (i = 0; i < 5; i++)
        pthread_create(&Threads[i], NULL, parseMappedString, (void*)&i);
}

In the code if you see that there are total five threads and each thread is processing 160 * 4000 elements. And they are storing based on their thread id hence into unique location in matrix. so that way it is synchronized. But getline can be done by any thread at any time hence thread no 5 can parse data which belongs to first thread. How do i avoid this ?

I had to following because I receive 1-4 threadids in args but never 0. It is always coming as some junk negative value hence I had to hardcode it like this.

if (threadID < 0) threadID = 0;

Pthread
  • 19
  • 1
  • 5
  • Idea is that thread no 1 should not parse string e.g. 178 * 278 element since it belongs to thread 2 – Pthread Jun 25 '13 at 12:41
  • How do you expect thread 2 to locate that element without either reading all the preceding elements or waiting for thread 1 to finish reading "its" parts? To do what you want you'd need every thread to wait for the previous thread to finish, so you won't gain anything by multithreading it. – molbdnilo Jun 25 '13 at 13:03
  • Does it mean that there is no way to optimize file reading. It is taking me 1.5 secs to read text file containing 4k *4k floats and storing it in matrix. No way to optimize ??? – Pthread Jun 25 '13 at 13:12
  • How fast do you expect it to get? 16 million floats in 1.5 seconds is ~100 nanoseconds per float, which is ~300 clocks on a 3GHz CPU. That's not bad. If you want them faster, store them in binary to avoid parsing. – molbdnilo Jun 25 '13 at 13:48

4 Answers4

1

I have mmapped a huge file into char string and made a c++ string

Don't, std::string has to copy the memory, so you lose the performance improvement mmap would otherwise get you. Just work on the raw memory as a char array

I could do it from one thread but I need to optimize it

Are you sure multiple threads will optimize it? Did you profile and confirm it's definitely CPU-bound and not I/O bound?


If you're sure multiple threads is the way to go, I'd suggest doing this:

  1. create N threads (this should be based on the number of cores and then tweaked according to test results)
  2. carve your mmap'd region up into N blocks of approximately equal size
    • you can just search back & forth for the nearest newline to your block boundary
  3. have each thread n create its own independent output
  4. combine all the outputs afterwards

As for the bug in the code I'm trying to persuade you not to use: you pass (void*)&i as your argument to the thread function. This is a pointer to an automatic local that goes out of scope at the end of create_threads_for_parsing, so it's likely to be random garbage by the time any thread reads it. Even if it weren't random garbage (ie, if create_threads_for_parsing joined all the threads before returning, to keep i in scope), it would be the same pointer for each thread.

To safely pass a distinct integer id to each thread, you should allocate a distinct integer for each thread, and pass its address. It's either that or mess around with intptr_t.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • If the whole file content is read I expect best results from completely linear read start to finish. Reading four quarters interleaved usually causes thrashing and serious slowdown. I would not use mmap for this case at all. – Balog Pal Jun 25 '13 at 13:54
  • Can you please tell me why I'm not getting the thread ids correctly in my parseMappedString function. I never get 0 as threadid which I should. And sometimes all 0 : – Pthread Jun 25 '13 at 13:54
  • @BalogPal if the file is sufficiently large that it spans RAID stripes then parallel reads might be useful, but I share your skepticism – Useless Jun 25 '13 at 14:04
0

std::string::getline is not thread-safe, you cannot use getline() from different threads.

You either need to access a known position in the raw string-data in memory using strncopy (c-style)

strncopy(matrix_str.c_str(), temp, 4000);

or using the substring-function (C++-style)

std::string piece = matrix_str.substr(i,4000)

EDIT: If your matrix_str is not a std::string but a std::sstream object, this will not work as a stream has to be accessed in order. Your question is a bit vague on that part...

PureW
  • 4,568
  • 3
  • 19
  • 27
  • Can you please tell me why I'm not getting the thread ids correctly in my parseMappedString function. I never get 0 as threadid which I should. And sometimes all 0 :( – Pthread Jun 25 '13 at 13:45
  • You are passing a pointer to the new thread containing your threadID. But the data this pointer points to is changing in your for-loop. So if you're lucky it works, but if not, you'll read a value while the for-loop changes it. – PureW Jun 25 '13 at 14:11
0

The code is almost fully mutexed -- so there's no point at all to use threads.

The idea of palatalization is to allow work actually done at the same time. For that you shall reduce data sharing, ideally to zero.

Like splitting the big string into 4 parts up front and post that to threads, so they can read and process it, placing result in their exclusive place too. The output can go to the matrix if no cells are shared, but be aware of false sharing that could still ruin performance.

Balog Pal
  • 16,195
  • 2
  • 23
  • 37
0

On the weird 0 ID part: I thought the posted code is just demonstration, but you may have it like that literally.

You must join all the threads before leaving function create_threads_for_parsing. As currently you pass to threads pointer to a local variable in it.

Worse, the variable is shared, so you have a race condition on it. You do something like:

static const int ids = {0, 1, 2, 3, 4};

and pass a pointer to the proper cell in the loop.

Balog Pal
  • 16,195
  • 2
  • 23
  • 37