3

I'm creating a Vulkan based renderer backend for my game framework. At the moment I'm loading in a mesh with around 10,000 unique triangles (not indexed - all individual) where each vertex has a position value, RGB value, no normals and no texture coords. This works out as 72 bytes per triangle, ie. 1 * xyz floats + 1 * RGB floats = 6 floats per vertex. 6 * 3 vertices = 18 floats per triangle. 18 * 4 = 72 bytes per triangle. The Vertex data is stored in a GPU local buffer with VK_BUFFER_USAGE_VERTEX_BUFFER_BIT flag set.

I'm also using the same vert and frag shaders for all meshes at the moment with push constants for the CPU calculated MVP matrix.

If I use multiples of 72 as for the offset param in vkCmdBindVertexBuffers(), then my mesh disintegrates in that the first triangles in the buffer are never drawn. I've incremented the offset by 72 frame by frame which dissolved the mesh with no segfaults or errors. LunarG standard validation is enabled with no reported validation errors, (I have a lot of error checking and logging in my code).

Incidentally, if I don't use multiples of 72 then I get some very interesting renders, but no crashes! I'm also getting a frame rate of 650fps on a six year old machine running in renderdoc.

This is the code that binds the vertex buffer...

vkCmdBindVertexBuffers, (cmd[swapindex], 0, 1, vertexBuffers, offsets)

Now, just because this runs fine on my PC doesn't mean that it's correct. One thing that I'm confused about is the area of the Vulkan spec regarding memory alignment requirements, specifically in VkPhysicalDeviceLimits.

There are several in VkPhysicalDeviceLimits:minTexelBufferOffsetAlignment, minUniformBufferOffsetAlignment & minStorageBufferOffsetAlignment.

The spec says: The alignment member satisfies the buffer descriptor offset alignment requirements associated with the VkBuffer’s usage:

If usage included VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT or VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT, alignment must be an integer multiple of VkPhysicalDeviceLimits::minTexelBufferOffsetAlignment.

If usage included VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, alignment must be an integer multiple of VkPhysicalDeviceLimits::minUniformBufferOffsetAlignment.

If usage included VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, alignment must be an integer multiple of VkPhysicalDeviceLimits::minStorageBufferOffsetAlignment.

I'm creating the Vertex buffer in device local memory with vkCreateBuffer() using bufferCreateInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT (And VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT).

The question... As I'm not creating a buffer with VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT, VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT or VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, does that mean that there are no memory alignment requirements for the offset parameter when I call vkCmdBindVertexBuffers(cmd, 0, 1, vertexBuffer, offsets)?

The reason I'm asking is that I want to store more than one mesh in a single vkCreateBuffer() allocated buffer with the VK_BUFFER_USAGE_VERTEX_BUFFER_BIT flag set. I can then offset into this 'super vertex buffer' for each unique mesh I need to draw, without having multiple Vertex buffer allocations. I know the limit for allocating Vertex buffers is usually 4096 (VkPhysicalDeviceLimits::maxMemoryAllocationCount) but rather than allocate multiple Vertex buffers I'd prefer to use one 'super buffer' for performance.

Does this make sense?

UPDATE: I've changed my code to use no offset in vkCmdBindVertexBuffers() and instead use the firstVertex param in vkCmdDraw() as a mesh model offset which produced a slightly higher and more stable FPS.

2 Answers2

1

I'm not seeing any alignment requirements in the spec, though I think that's probably an oversight. You might try rounding to a multiple of 16; any actual alignment requirement is unlikely to be larger than that. So if your first mesh is 5 triangles, you need 5*72 bytes for it, and the second mesh would start at offset round_up(5*72, 16)=368. If that doesn't work, you probably have a bug elsewhere.

Rather than using offsets to vkCmdBindVertexBuffers, though, you could just bind the full vertex buffer once, and use the firstVertex parameter for each draw to indicate the index into the buffer where the mesh starts.

Jesse Hall
  • 6,441
  • 23
  • 29
  • For pure functionally-correct alignment then 16 bytes is probably fine, but on mobile it's worth noting that most of the memory coherency protocols operate on cache lines which are 64 byte aligned. If you have concurrent CPU modifications in the buffer at the same time as GPU access you really want to space those concurrent accesses at least 64 bytes apart and on 64 byte alignment to get best cache usage. – solidpixel Feb 06 '19 at 09:51
  • Thanks for the advice Jessie. I didn't think there was an alignment issue either but as you suggested, I've changed my code to use no offset in vkCmdBindVertexBuffers() and instead use the firstVertex param in vkCmdDraw() as a mesh model offset which produced a slightly higher and more stable FPS. Thanks again! –  Feb 06 '19 at 16:53
1

The alignment requirements are outlined in the vulkan spec here.

Basically a vertex attribute must be aligned with the component type of the input attribute. Packed formats have slightly different requirements.

For example VK_FORMAT_R32G32B32_SFLOAT would be 4 byte aligned since the components are 4 bytes. While a 64 bit type would be 8 byte aligned.

If vertex attributes are a mix of sizes that ends on an unaligned boundary for the first attribute of the next vertex, then the next attribute should be moved to the correct boundary. (like zero filling between attributes when they are built)

Also, in vulkan, attributes are loaded as 16 byte chunks. So a vec3 which is 12 bytes would consume 16 bytes, it is zero extended to 16 bytes and it is not possible to access the extended data in the vertex shader. It is possible to put attributes into unused components. For example:

layout( location=0 ) in vec3 position; // uses components 0,1,2, but 3 is empty
layout( location=0, component=3) in float u; // use the next 4 bytes

or just use 16 byte types, and back them with real data:

layout( location=0) in vec4 position;

Aligning data on a larger boundary like 64 bytes may help with cache coherency but the GPU is streaming the data so a larger mesh that takes advantage of the stream will give better results then worrying to much about data alignment.

pmw1234
  • 206
  • 1
  • 7