0

I have the following schematic parallelized for-loop (using OpenMP),

char command[200];
int thread_id;

# pragma omp parallel private(thread_id)
{
    thread_id = omp_get_thread_num();

    # pragma omp for
    for (int i = 0; i < max_value; i++) {
        // Generates a text file here (.inp) with a file name of node_(thread_id).inp

        std::sprintf(command, "executable.exe < node_%d.inp > node_%d.out", thread_id, thread_id);
        std::system(command);

        // Reads the output file node_(thread_id).out
    }
}

I then compile the code with MPC 1.1.0 and GCC 8.4.0, and submit it to SLURM. The code seems to run fine at some times but I observed that there are times where the file name is wrong. For example, the .inp file becomes "nodee0.inp" or "node_0.o.o". There are also times that SLURM throws a segmentation fault error, and there are other times that there are no files written from node 0, i.e. the name of the file ends with node 1 or 2, which is against from what I expect that there should always be a "node_0".

So my question is, what could be the cause of the misnaming of the files, the segmentation fault error, and the files from the missing nodes? The problem might be from SLURM since the code works fine at other times, but I wonder if I should write or change something in the code to decrease the occurrences of these errors.

Thank you in advance.

magus_e
  • 31
  • 5
  • 1
    I would not use OMP for that, but use some external library (for example boost.process) for handling process. This way multithreading would be obsolete. – Marek R Sep 10 '21 at 09:18
  • Every thread is probably using the same array to store the command. Where is `command` defined? – user253751 Sep 10 '21 at 10:10
  • @Marek R - I am not yet familiar with boost.process, but I found this post as a start https://stackoverflow.com/questions/62602232/using-boost-to-create-multiple-child-processes-to-be-run-asynchronously. From here, it seems that I need to write the file first for all iterations and then use them as input for the child processes, if my understanding is correct. But `max_value` is too large, that it could lead to half a million input files. – magus_e Sep 10 '21 at 17:15
  • @user253751 The `command` is defined outside the loop. – magus_e Sep 10 '21 at 17:16
  • @mague_e so there's your problem. – user253751 Sep 10 '21 at 17:18
  • @magus_e functionality of `system` is extremely primitive and I think this is the only reason you introduced threads. Use mutithreading is hard and totally obsolete in your case. That is why I recommend you to use some third party library which is able to handle process asynchronously. – Marek R Sep 11 '21 at 18:41

1 Answers1

1

You only have one command variable. Every thread writes into the same array and then runs the command from that array. If the thread gets lucky, no other thread changed the array in the meantime. If the thread doesn't get lucky, it executes some mish-mash of what it wanted to write and what some other thread wanted to write.

I'm not at all familiar with OpenMP or SLURM, but you should be able to fix that by moving it inside the loop so each loop iteration gets its own variable.

user253751
  • 57,427
  • 7
  • 48
  • 90
  • Thank you for this! I re-positioned the definition of `command` inside the for-loop instead, and there seems to be no problem with the .inp and .out files now. Although I am still experiencing segmentation faults. – magus_e Sep 11 '21 at 10:40