2

I'm trying to implement parallelism to this function I want it to take as many threads as possible, and write the results to a file.

The results need to be written in the file in the incrementing order so the first result needs to be written first the second second and so on.

The keyGen function is simply an MD5 of the integer m which is used as the start point for each chain. Reduction32 is a reduction function it takes the first 8 byte adds t and returns that value. When a chain reaches its endpoint it is stored in the binary file.

Is there a smart way to make this parallel? without screwing up the order the endpoints are stored in?

void tableGenerator32(uint32_t * text){
    int mMax = 33554432, lMax = 236;
    int m, t, i;
    uint16_t * temp;
    uint16_t * key, ep[2];
    uint32_t tp;
    FILE * write_ptr;
    write_ptr = fopen("table32bits.bin", "wb");
    for(m = 0; m < mMax ; m++){
        key = keyGen(m);
        for (t = 0; t < lMax; t++){
            keyschedule(key);
            temp = kasumi_enc(text);
            tp = reduction32(t,temp);
            temp[0]=tp>>16;
            temp[1]=tp;
            for(i=0; i < 8; i++){
                key[i]=temp[i%2];
            }
        }
        for(i=0;i<2;i++)
            ep[i] = key[i];

        fwrite(ep,sizeof(ep),1,write_ptr);
    }
    fclose(write_ptr);
}
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Becktor
  • 175
  • 1
  • 10
  • What have you already tested? – prodev_paris Jun 16 '15 at 12:30
  • I have tested with and simply defining 5 or so threads that run the function, having issues with the threads since they all will start the function from the start. So nothing much really was hoping someone could point me in the right direction. Thing is I want it to be scalable so the thread count doesn't matter. – Becktor Jun 16 '15 at 12:40
  • 1
    Maybe you can start with "simple" loop enhancer using [OpenMP](https://en.wikipedia.org/wiki/OpenMP)... in particular the `#pragma omp parallel for`. Be careful about which variable shall be _shared_ or not... – prodev_paris Jun 16 '15 at 12:49
  • 1
    ...And more precisely look at the section "Synchronization clauses" because there is a `ordered` keyword allowing _the structured block to be executed in the order in which iterations would be executed in a sequential loop_, thus with caution this may allow to preserve the order in your file... – prodev_paris Jun 16 '15 at 12:56

1 Answers1

1

The best way to parallelize the above function without facing concurrency issues is to create as many memory streams as many threads you wish to use and then divide the task into fractions, like if you have 4 threads,

  • one thread performs the task from 0 to mMax / 4
  • one thread performs the task from mMax / 4 to (mMax / 4) * 2
  • one thread performs the task from (mMax / 4) * 2 to (mMax / 4) * 3
  • one thread performs the task from (mMax / 4) * 3 to (mMax / 4) * 4

then you concatenate the result streams and write them into a file.

mg30rg
  • 1,311
  • 13
  • 24