so my goal is to use the GPU for my brand new Java project which is to create a game and the game engine itself (I think it is a very good way to learn in deep how it works).
I was using multi-threading on the CPU with java.awt.Graphics2D to display my game, but i have observed on other PCs that the game was running below 40FPS so i have decided to learn how to use GPU (I will be still rendering all objects in a for loop then draw the image on screen).
For that reason, I started to code following the OpenCL documentation and the JOCL samples a small simple test which is to paint the texture onto the background image (let's amdit that every entities has a texture).
This method is called in each render call and it is given the background, the texture, and the position of this entity as arguments.
Both codes below has been updated to fit @ProjectPhysX recommandations.
public static void XXX(final BufferedImage output_image, final BufferedImage input_image, float x, float y) {
cl_image_format format = new cl_image_format();
format.image_channel_order = CL_RGBA;
format.image_channel_data_type = CL_UNSIGNED_INT8;
//allocate ouput pointer
cl_image_desc output_description = new cl_image_desc();
output_description.buffer = null; //must be null for 2D image
output_description.image_depth = 0; //is only used if the image is a 3D image
output_description.image_row_pitch = 0; //must be 0 if host_ptr is null
output_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
output_description.num_mip_levels = 0; //must be 0
output_description.num_samples = 0; //must be 0
output_description.image_type = CL_MEM_OBJECT_IMAGE2D;
output_description.image_width = output_image.getWidth();
output_description.image_height = output_image.getHeight();
output_description.image_array_size = output_description.image_width * output_description.image_height;
cl_mem output_memory = clCreateImage(context, CL_MEM_WRITE_ONLY, format, output_description, null, null);
//set up first kernel arg
clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(output_memory));
//allocates input pointer
cl_image_desc input_description = new cl_image_desc();
input_description.buffer = null; //must be null for 2D image
input_description.image_depth = 0; //is only used if the image is a 3D image
input_description.image_row_pitch = 0; //must be 0 if host_ptr is null
input_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
input_description.num_mip_levels = 0; //must be 0
input_description.num_samples = 0; //must be 0
input_description.image_type = CL_MEM_OBJECT_IMAGE2D;
input_description.image_width = input_image.getWidth();
input_description.image_height = input_image.getHeight();
input_description.image_array_size = input_description.image_width * input_description.image_height;
DataBufferInt input_buffer = (DataBufferInt) input_image.getRaster().getDataBuffer();
int input_data[] = input_buffer.getData();
cl_mem input_memory = clCreateImage(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, format, input_description, Pointer.to(input_data), null);
//loads the input buffer to the gpu memory
long[] input_origin = new long[] { 0, 0, 0 };
long[] input_region = new long[] { input_image.getWidth(), input_image.getHeight(), 1 };
int input_row_pitch = input_image.getWidth() * Sizeof.cl_uint; //the length of each row in bytes
clEnqueueWriteImage(commandQueue, input_memory, CL_TRUE, input_origin, input_region, input_row_pitch, 0, Pointer.to(input_data), 0, null, null);
//set up second kernel arg
clSetKernelArg(kernel, 1, Sizeof.cl_mem, Pointer.to(input_memory));
//set up third and fourth kernel args
clSetKernelArg(kernel, 2, Sizeof.cl_float, Pointer.to(new float[] { x }));
clSetKernelArg(kernel, 3, Sizeof.cl_float, Pointer.to(new float[] { y }));
//blocks until all previously queued commands are issued
clFinish(commandQueue);
//enqueue the program execution
long[] globalWorkSize = new long[] { input_description.image_width, input_description.image_height };
clEnqueueNDRangeKernel(commandQueue, kernel, 2, null, globalWorkSize, null, 0, null, null);
//transfer the output result back to host
DataBufferInt output_buffer = (DataBufferInt) output_image.getRaster().getDataBuffer();
int output_data[] = output_buffer.getData();
long[] output_origin = new long[] { 0, 0, 0 };
long[] output_region = new long[] { output_description.image_width, output_description.image_height, 1 };
int output_row_pitch = output_image.getWidth() * Sizeof.cl_uint;
clEnqueueReadImage(commandQueue, output_memory, CL_TRUE, output_origin, output_region, output_row_pitch, 0, Pointer.to(output_data), 0, null, null);
//free pointers
clReleaseMemObject(input_memory);
clReleaseMemObject(output_memory);
}
And here's the program source runned on the kernel.
const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;
__kernel void drawImage(__write_only image2d_t dst_image, __read_only image2d_t src_image, float xoff, float yoff)
{
const int x = get_global_id(0);
const int y = get_global_id(1);
int2 in_coords = (int2) { x, y };
uint4 pixel = read_imageui(src_image, sampler, in_coords);
pixel = -16184301;
printf("%d, %d, %u\n", x, y, pixel);
const int sx = get_global_size(0);
const int sy = get_global_size(1);
int2 out_coords = (int2) { ((int) xoff + x) % sx, ((int) yoff + y) % sy};
write_imageui(dst_image, out_coords, pixel);
}
Without the call to write_imageui, the background is painted black, otherwhise it is white. At the moment, I am a bit struggling to understand why pixel = 0 in the C function, but i think that someone familiar with JOCL would found out very quick my error in this code. I am very confused with this code for today, maybe tomorrow, but i don't feel like I will ever catch myself my mistake. For that reason i request your help to review my code. I feel like an idiot that i can't figure it out at that point.