2

Instead of parfor loop I want to create 4 threads and write out the code for each individually. What's the syntax for that?

Alex Azazel
  • 332
  • 5
  • 18
  • Do you really want to control which thread runs on which core? Using parallel computing in matlab you typically distribute the jobs among the worker threads, not specifying which or how many cores do the job. – Daniel Jul 19 '15 at 20:34
  • @Daniel46 Well, I have 4 different things to be done independent of each other, all 4 of them take same amount of time, can't I distribute them on 4 cores? Or suppose I have 3 different things to be done in parallel, how can I do it? – Alex Azazel Jul 19 '15 at 20:39

2 Answers2

3

You have two options here. The first is to use parfeval where you can request several independent function evaluations, like so:

% The following line executes
% out = a(a1, a2, a3) on a worker. (The number 1 is the
% the number of outputs requested from the function evaluation)
% The results can be obtained using
% out = fetchOutputs(f_a);
f_a = parfeval(@a, 1, a1, a2, a3);

% and so on...
f_b = parfeval(@b, 1, b1, b2);
f_c = parfeval(@c, 1, c1);
f_d = parfeval(@d, 1, d1, d2);

You can retrieve the results using fetchOutputs(f_a) etc.

Another option is to use spmd like so:

spmd
    switch labindex
      case 1
        a();
      case 2
        b();
      ...
    end
end

Generally, for independent tasks, I would suggest parfeval since this approach is not dependent on the number of workers in your parallel pool, whereas the spmd approach is.

Edric
  • 23,676
  • 2
  • 38
  • 40
2

I recommend to use Edric's answer or this solution

(I leave the answer here for the comments.)

Forget about cores, you want to distribute your tasks among your worker processes.

Simple "hack" solution:

n=4
result=cell(4,1)
parfor idx=1:n
   switch idx
      case 1
         r=f()
      case 2
         r=g(1)
      case 3
         r=g(2)
      case 4
         r=h()
   end
   result{idx}=r
end

For a more advanced solution, I recommend to create individual jobs and submit them. This is explained in detail here. In your case you create a job with four tasks, then you submit it. The biggest advantage of this solution is, that you avoid unnecessary broadcasting of variables.

In both solutions you don't control which worker processes which task, but you typically don't want to do this.

Daniel
  • 36,610
  • 3
  • 36
  • 69
  • Well, it works however it turns out my problem isn't big enough to benefit from this, thank you anyways! I'm sure I'll use this hack some other day. And by the way `result{idx}=r` should be outside the `switch` right? After the `end` – Alex Azazel Jul 19 '15 at 22:25
  • Ah, fixed the result. When using the parallel computing toolbox keep in mind that it uses inter process communication via (localhost) network connections. Depending on the problem that adds a significant overhead. – Daniel Jul 19 '15 at 22:49
  • 2
    This is an undesirable solution because you're assuming that each worker will be given a single iteration of the `parfor` loop to operate on - this is not guaranteed to be the case. – Edric Jul 20 '15 at 08:13
  • @Edric: That is right, both solutions use automatic distribution of the jobs among the workers. Same is true for your solution using `parfeval`. – Daniel Jul 20 '15 at 08:33
  • 3
    That's not quite right. `parfor` batches iterations up for efficiency reasons, whereas `parfeval` does not. – Edric Jul 20 '15 at 09:24
  • @Edric: Thanks, did not knew that difference. – Daniel Jul 20 '15 at 09:25
  • @Daniel Do you know why it is the case that the MATLAB PCT sends messages to localhost on shared memory machines? It is typical for MPI implementations to be able to recognize shared-memory cases and not send messages, and the PCT is MPI-based, so I am not sure why the performance in these cases is necessarily so bad. But it makes the toolbox as a whole a lot less useful, that's for sure... – transversality condition Jul 20 '15 at 19:26
  • @MichaelJ: Not sure if shared memory really would be faster when communication between two java processes. I know only a solution which requires three layers of shared memory communication. Anyone who is really concerned about the performance of a local pool should generate C code and enable multithreading via OpenMP instead. – Daniel Jul 20 '15 at 20:19
  • @Daniel I am no expert but based on some reading, e.g., http://stackoverflow.com/a/13503059/4674830, seem to indicate that OpenMPI will make appropriate optimizations for communication between nodes with shared physical memory. – transversality condition Jul 21 '15 at 12:54