2

I am using MATLAB's quadprog and it runs extremely slow on my local machine.

When I run the exact code on a remote machine, it completes within 10 minutes. When I run it on my local machine, it doesn't terminates even after 24 hours (I kill it at some point).

While the code runs, the memory usage on my local machine is ~10GB RAM (while my local machine has ~100 GB of free RAM). The usage on the remote machine is 20-30GB RAM .

Any idea on what to do to make it run faster on my local machine?

Important EDIT 18 Oct. : I executed a smaller scale problem on both machines. On the local machine it takes 1900 sec, on the remote it takes 8 sec, a gain of ~240. Both machine also have multiple multi-core processors. I noticed this time with htop, that the remote machine uses all its processors, whilst the local machine uses only a single processor (although all the others are available). Any idea on how can I make MATLAB use all processors on the local machine?


Some side notes:

1: nnz for H, Aeq =~ 10e6, dimensions are approx 11e6 x 11e6

2: quad programming with only equality constraints has a closed form solution (See Boyd ). When I solve it with the closed form solution, it takes ~10 minutes on my local machine vs 5 minutes on the remote machine. While both consume ~20-30GB of memory. Since I would like to add inequality constraints, I would like to be able to run quadprog quickly on my local machine.

3: Below is cat /proc/cpuinfo on my machine vs remote machine (the remote machine is stronger, however the local machine is also strong): A 14 cores vs 4 cores is a gain of ~x3.5 (not taking multi-thread overhead), and AVX vs SSE is max ~x2. So it doesn't explain a gain of 240 that I see. Also, when I use the closed form solution (instead of quadprog), the remote machine has a gain of only x2, vs the local machine.

4: I am sure I am running 64 bit version, because I see that the memory consumption is 10-15GB.

5: The local system runs RHEL, the remote runs ubuntu.

Local uname -a results:

 Linux hostname 2.6.32-573.7.1.el6.x86_64 #1 SMP Thu Sep 10 13:42:16 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux    

Remote uname -a results:

 Linux hostname 3.13.0-65-generic #105-Ubuntu SMP Mon Sep 21 18:50:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux    

6: Hyper threading is enabled on the machine. I checked it with this script.

7: Starting parallel pool as someone suggested doesn't help.

Thanks!

Local machine cpu info of a single processor (out of many)

vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
stepping        : 5
microcode       : 25
cpu MHz         : 1600.000
cache size      : 8192 KB
physical id     : 1
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 23
initial apicid  : 23
fpu             : yes
fpu_exception   : yes
cpui level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 4532.68
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Remote machine cpu info of a single processor (out of many)

vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
stepping        : 2
microcode       : 0x2d
cpu MHz         : 1200.000
cache size      : 35840 KB
physical id     : 1
siblings        : 28
core id         : 14
cpu cores       : 14
apicid          : 61
initial apicid  : 61
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips        : 5189.05
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
Community
  • 1
  • 1
Yuval Atzmon
  • 5,645
  • 3
  • 41
  • 74
  • What version of MATLAB? – Royi Oct 17 '15 at 15:48
  • I use 2014b version. – Yuval Atzmon Oct 17 '15 at 17:44
  • 3
    On both computers? Is there a chance one of them supports AVX and the other doesn't or something like that? Moreover, the other computer has 14 cores, if this implementation can handle them it might be it. – Royi Oct 17 '15 at 18:43
  • 1
    Just as a note: MATLAB does not support multi/hyper threading, since it was found to be slower than optimising usage of all physical cores. – Adriaan Oct 19 '15 at 10:49
  • Based on my very limited experience with Linux, I'd suggest you do an extreme experiment ( find a spare disk, install only RHEL and MATLAB, rerun the script) to find out the answer because I think it could be coming from something outside of MATLAB. I don't see any immediate reason why MATLAB can't implement mutlithreading / multicore computation on your local machine, which lead me to believe that it may be something else you did to the system that made MATLAB behave differently. It might be worth testing if your MATLAB is performing any multicore processing at all. – user3667217 Oct 23 '15 at 08:57
  • Here's a list of the functions that are multi-threaded in MATLAB :http://www.mathworks.com/matlabcentral/answers/95958-which-matlab-functions-benefit-from-multithreaded-computation – user3667217 Oct 23 '15 at 08:58

1 Answers1

0

The answer, unless it is something with the OS configuration has to do with one of the follwoing:

  1. The remote computer has 14 Cores. If the implementation can handle all 14 cores, it will benefit significantly from it.
  2. The CPU of the Remote Computer Intel Xeon E5-2697 v3 supports AVX and AVX 2.0.
    The CPU of the local computer, Intel Xeon E5520 supports only SSE 4.2 and has lower frequency.
    If the algorithm can utilize those vectorized functions (And I think 2014b uses MKL which is new enough to do that) it should boost the performance significantly.

Taking all that into account explains what you see.

Royi
  • 4,640
  • 6
  • 46
  • 64
  • 1
    Thanks.. but a 14 cores vs 4 cores is a gain of ~x3.5 (not taking multi-thread overhead), and AVX vs SSE is ~x2. I killed the process after 24 hours (vs 10 minutes!) which is a gain of at least x144. Can AVX2 and 14 cores explain such a huge gain? Also, when I use the closed form solution, the remote machine has a gain of only x2, vs the local machine. Although the closed form solution should also take advantage of the MKL. – Yuval Atzmon Oct 17 '15 at 21:07