I am using MATLAB's quadprog
and it runs extremely slow on my local machine.
When I run the exact code on a remote machine, it completes within 10 minutes. When I run it on my local machine, it doesn't terminates even after 24 hours (I kill it at some point).
While the code runs, the memory usage on my local machine is ~10GB RAM (while my local machine has ~100 GB of free RAM). The usage on the remote machine is 20-30GB RAM .
Any idea on what to do to make it run faster on my local machine?
Important EDIT 18 Oct. : I executed a smaller scale problem on both machines. On the local machine it takes 1900 sec, on the remote it takes 8 sec, a gain of ~240. Both machine also have multiple multi-core processors. I noticed this time with htop
, that the remote machine uses all its processors, whilst the local machine uses only a single processor (although all the others are available). Any idea on how can I make MATLAB use all processors on the local machine?
Some side notes:
1: nnz for H, Aeq =~ 10e6, dimensions are approx 11e6 x 11e6
2: quad programming with only equality constraints has a closed form solution (See Boyd ). When I solve it with the closed form solution, it takes ~10 minutes on my local machine vs 5 minutes on the remote machine. While both consume ~20-30GB of memory. Since I would like to add inequality constraints, I would like to be able to run quadprog quickly on my local machine.
3: Below is cat /proc/cpuinfo
on my machine vs remote machine (the remote machine is stronger, however the local machine is also strong): A 14 cores vs 4 cores is a gain of ~x3.5 (not taking multi-thread overhead), and AVX vs SSE is max ~x2. So it doesn't explain a gain of 240 that I see. Also, when I use the closed form solution (instead of quadprog), the remote machine has a gain of only x2, vs the local machine.
4: I am sure I am running 64 bit version, because I see that the memory consumption is 10-15GB.
5: The local system runs RHEL, the remote runs ubuntu.
Local uname -a
results:
Linux hostname 2.6.32-573.7.1.el6.x86_64 #1 SMP Thu Sep 10 13:42:16 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
Remote uname -a
results:
Linux hostname 3.13.0-65-generic #105-Ubuntu SMP Mon Sep 21 18:50:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
6: Hyper threading is enabled on the machine. I checked it with this script.
7: Starting parallel pool as someone suggested doesn't help.
Thanks!
Local machine cpu info of a single processor (out of many)
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
stepping : 5
microcode : 25
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 4
apicid : 23
initial apicid : 23
fpu : yes
fpu_exception : yes
cpui level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 4532.68
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
Remote machine cpu info of a single processor (out of many)
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
stepping : 2
microcode : 0x2d
cpu MHz : 1200.000
cache size : 35840 KB
physical id : 1
siblings : 28
core id : 14
cpu cores : 14
apicid : 61
initial apicid : 61
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips : 5189.05
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management: