I am facing a problem by using explicit buffer management with Aparapi.
The code hereafter shows that I am trying to manage several put/get in a loop to refresh/get-back data from GPU. It seems that the first put
and get
are done but not the others.
import com.amd.aparapi
// Dummy test to reproduce explicit buffer management
public class QuickTestExplicit extends Kernel
{
private static final float DELTA = (float) 1E-5;
// will be filled, put on GPU in each iterations
private float[][] values;
// will be filled with results, put in GPU once but retrieved several times
private float[] currentRes;
private void initData()
{
values = new float[2000][20];
currentRes = new float[2000];
}
@Override
public void run()
{
int id = getGlobalId();
long accum = 0;
// simple sum of elements
for (int index = 0; index < 20; ++index)
{
accum += values[id][index];
}
currentRes[id] = accum;
}
public void process()
{
boolean passed = true;
initData();
if (isExplicit())
{
put(currentRes);
}
for (int row = 0; row < 2000; ++row)
{
for (int i = 0; i < values.length; ++i)
{
for (int depth = 0; depth < 20; ++depth)
{
values[i][depth] = (float) row;
}
}
if (isExplicit())
{
put(values);
}
execute(values.length);
if (isExplicit())
{
get(currentRes);
}
// just check the success of the operation (for the example)
passed = true;
for (int currentIndexRes = 0; currentIndexRes < currentRes.length; ++currentIndexRes)
{
passed &= Math.abs(currentRes[currentIndexRes] - (row * 20.0)) < DELTA;
}
if (passed)
{
System.out.println("ROW " + row + " PASSED");
}
else
{
System.out.println("ROW " + row + " FAILED");
}
}
}
public static void main(String[] args)
{
QuickTestExplicit kern = new QuickTestExplicit();
kern.setExecutionMode(EXECUTION_MODE.GPU);
kern.setExplicit(true);
kern.process();
}
}
So my questions are:
- How to force the update of a large buffer already put in GPU memory ?
- Why, when I am running this piece of code with implicit buffer management, a SIGSEV is thrown ?
I don't think that it's a problem relative to GPU memory capacity (2GB memory in my case, and the application just put 2000*20*4 + 2000*4 = 168KB) I'm using a CUDA architecture. FYI, this program passes when running in JTP mode.
Thanks in advance !
EDIT: I forgot to mention that I'm using the "Aparapi_2014_04_29_Linux64" version available in svn/trunk/Downloads".
It seems that the problem occurred when using 2D Java primitive arrays. I rewrite the same algorithm by using 1D Java primitive arrays, and that worked perfectly... Any ideas ?