0

I have a Matlab simulation which updates an array :

Array=zeros(1,1000) 

as follows:

for j=1:100000 
Array=Array+rand(1,1000) 
end 

My question is the following: This loop is linear, so it cannot be parralelized for each slot in the array, but different slots are updated independently. So, naturally Matlab performs array operations such as this in parralell using all the cores of the CPU.

I wish to get the same calculation performed on my NVIDIA GPU, in order to speed it up (utilizing the larger number of cores there).

The problem is: that naively doing

tic 
Array=gpuArray(zeros(1,1000));
for j=1:100000 
    Array=Array+gpuArray(rand(1,1000));  
end  
toc 

results in the calculation time being 8 times longer!

a. What am I doing wrong?

Update: b. Can someone provide a different simple example perhaps, to which GPU computing is beneficial? My aim is to understand how I can utilize it in Matlab for very "heavy" stochastic simulations (multiple linear operations on big arrays and matrices).

talonmies
  • 70,661
  • 34
  • 192
  • 269
user1611107
  • 287
  • 1
  • 4
  • 13
  • Possibly related Q&A (if your problem is generating random numbers): https://stackoverflow.com/q/39251206/3372061 – Dev-iL Jan 17 '18 at 17:18

2 Answers2

2

Nothing.

This is how GPU computing works. Unfortunately it is not magic. CPU-GPU communication is slow, very slow. Every iteration, you create an array on CPU and send it to the GPU, and that is the slow part. I am sure that the ridiculously fast in CPU "+" operation is even faster in the GPU, but the improvement is completely overshadowed by the amount of time it takes to send the information to the GPU.

Your code, as is, has little room for improvement.

Ander Biguri
  • 35,140
  • 11
  • 74
  • 120
  • Thanks. Pity. Of course I just tried to give a simple test case. However, can you provide a different simple example to which GPU computing is beneficial? My aim is to understand how I CAN utilize it in Matlab. – user1611107 Jan 17 '18 at 14:42
1

It probably wont help the overall speed (as @Ander mentioned in his answer), but one small improvement you could make is to build the random numbers directly on the GPU like so:

rand(1, 10000, 'gpuArray')

In general, random number generation on the GPU is much faster than on the CPU.

You can go further by using the gpuArray version of arrayfun, which JIT-compiles the body into native GPU code. On my GPU (Tesla K20c), this makes the GPU version 10x faster than the CPU version. Here's the full script:

%% CPU version
tic
Array=zeros(1,1000);
for j=1:100000
    Array=Array+rand(1,1000);
end
cpuTime = toc

%% GPU version
dev = gpuDevice();
tic
Array = zeros(1, 1000, 'gpuArray');
Array = arrayfun(@iFcn, Array);
wait(dev);
gpuTime = toc

%% Arrayfun body
function x = iFcn(x)
for j = 1:100000
    x = x + rand;
end
end
Edric
  • 23,676
  • 2
  • 38
  • 40
  • @Ederic Thanks, but there seems to be a problem, when I run your code I get: "Error using gpuArray/arrayfun Unable to resolve the function handle.". Can you see what is the problem please? – user1611107 Jan 18 '18 at 13:18
  • 1
    You need iFcn to be a function on the MATLAB path, either in its own file, or (in recent versions of MATLAB that support functions inside scripts) in the script file together with the arrayfun call. In R2017b, placing the entire text into a single script file should work. – Edric Jan 18 '18 at 14:36