0

I have a a loop within a loop The outer loop just goes through each file in a directory and runs the inner loop

Can I select multiple files and have the inner loop running on these files simultaneously

I am using a 32 core 100gb ram server. And I am currently only using 1% CPU going through 1 by 1

I would like to speed this up by running multiple files simultaneously, being able to make the most out of my supercomputer and reduce time

Lucas
  • 65
  • 2
  • 10
  • you need to use multiple threads most likely (one for each file), it will not happen magically by selecting 2 files, the loop will still be synchronous for the first file then the second. – abdul ahmad Dec 29 '16 at 13:49
  • 1
    sometimes you can use parfor instead of for – Datsheep Dec 29 '16 at 13:51
  • 1
    What kind of operation does the inner loop ? If it is mainly file read/write, you won't save much time by running things in parallel as the bottle neck will be your disk i/o anyway. If the main time in the inner loop is spent on CPU computation, then it might be useful to parallelise. – Hoki Dec 29 '16 at 14:18
  • Look for 'parfor' – Mendi Barel Dec 29 '16 at 19:04
  • If each file is not too big, and most of the time goes on CPU computation, as @Hoki pointed, you may want to consider first loading all the files, and then do the computation in a vectorized way (if possible). – EBH Dec 29 '16 at 19:09
  • your question is kind of vague. The answer to your question is 'yes, you can select multiple files and process them asynchronously (simultaneously)' but the real question is 'how?'. You will probably need to do this on multiple threads, each thread processing each individual file independently. I'm no expert in matlab, but you can probably find lots of resources by doing a google search here's a link with a tutorial that I found: http://www.instructables.com/id/Matlab-Multithreading-EASY/ – abdul ahmad Dec 29 '16 at 13:51

1 Answers1

1

There are several options to parallelise a matlab script.

  1. if you have the licence for the Parallel computing toolbox, you can replace the outer loop with a parfor loop. See this.
  2. if you do not have that licence, you can use the Multicore third party package. You will need to modify your code to write code for a master and for a slave. See this.
  3. if you do not want to re-think your code too much, you can remove the outer loop and accept a filename as argument. Then use GNU parallel to launch as many instances of the script as there are processors in the machines, and keeping doing that until all files are processed. See this.
damienfrancois
  • 52,978
  • 9
  • 96
  • 110