I am building a kernel code that captures the triangles inside my current GL scene.
For that I send my vertex streams along with my indices to the kernel code.
Here's the declaration for my kernel entry:
__kernel
void CaptureTriangles(
const uint NumTriangles,
const float16 WorldMatrix,
__constant ushort3 *IndexDataBlock,
__constant struct Vertex *DataBlock,
__global struct Triangle *TriangleBuffer,
__global uint *TriangleBufferCount)
The Vertex
structure is defined as such:
struct Vertex
{
float3 position;
float3 normal;
float materialIndex;
}
__attribute__((packed));
Now this stream was created through GL and that's how the data is laid out.
When fetching the triangles, I do the following in kernel code:
const ushort3 idx = IndexDataBlock[get_global_id(0)];
const struct Vertex v0 = DataBlock[idx.x],
v1 = DataBlock[idx.y],
v2 = DataBlock[idx.z];
But it seems like OpenCL keeps on re-aligning the Vertex
struct to its own internal requirements even though it is declared as __attribute((packed))
.
So the triangles are never captured properly.
Switching from __constant struct Vertex *DataBlock
to __constant float *DataBlock
and fetching each float explicitly in the kernel code fixes the issue.
So this works when reading float by float:
// __constant float *DataBlock
float4 p0 = (float4)(DataBlock[7 * idx.x + 0], DataBlock[7 * idx.x + 1], DataBlock[7 * idx.x + 2], 1.0f),
p1 = (float4)(DataBlock[7 * idx.y + 0], DataBlock[7 * idx.y + 1], DataBlock[7 * idx.y + 2], 1.0f),
p2 = (float4)(DataBlock[7 * idx.z + 0], DataBlock[7 * idx.z + 1], DataBlock[7 * idx.z + 2], 1.0f);
I'd rather use the struct Vertex
syntax for code clarity, is there any way to get OpenCL to not re-align structs?