Non-interleaved vertex buffers DirectX11

Question

If my vertex positions are shared, but my normals and UVs are not (to preserve hard edges and the likes), is it possible to use non-interleaved buffers in DirectX11 to solve this memory representation, such that I could use indice buffer with it? Or should I stick with duplicated vertex positions in an interleaved buffer?

And is there any performance concerns between interleaved and non-interleaved vertex buffers? Thank you!

Just to clarify a few points: I am currently using an interleaved buffer all the time and the only case where I want to use a non-interleaved buffer is to playback vertex cache files (send them directly to GPU without reordering strides to match interleaved setup). — Erunehtar, Nov 07 '13 at 21:03

score 14 · Accepted Answer · edited Jun 20 '20 at 09:12

How to

There are several ways. I'll describe the simplest one.

Just create separate vertex buffers:

ID3D11Buffer* positions;
ID3D11Buffer* texcoords;
ID3D11Buffer* normals;

Create input layout elements, incrementing InputSlot member for each component:

{ "POSITION",  0,  DXGI_FORMAT_R32G32B32_FLOAT,  0, 0,                            D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "TEXCOORD",  0,  DXGI_FORMAT_R32G32_FLOAT,     1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "NORMAL",    0,  DXGI_FORMAT_R32G32B32_FLOAT,  2, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
                                             //  ^
                                             // InputSlot

Bind buffers to their slots (better all in one shot):

ID3D11Buffer** vbs = {positions, texcoords, normals};
unsigned int strides[] = { /*strides go here*/ };
unsigned int offsets [] = { /*offsets go here*/ };
m_Context->IASetVertexBuffers(0, 3, vbs, strides, offsets );

Draw as usual. You don't need to change HLSL code (HLSL will think as it have single buffer).

Note, that code snippets was written on-the-fly and can contain mistakes.

Edit: you can improve this approach, combining buffers by update rate: if texcoords and normals never changed, merge them.

As of performance

It is all about locality of references: the closer data, the faster access.

Interleaved buffer, in most cases, gives (by far) more performance for GPU side (i.e. rendering): for each vertex each attribute near each other. But separate buffers gives faster CPU access: arrays are contiguous, each next data is near previous.

So, overall, performance concerns depends on how often you writing to buffers. If your limiting factor is CPU writes, stick to separate buffers. If not, go for single one.

How will you know? Only one way - profile. Both, CPU side, and GPU side (via Graphics debugger/profiler from your GPU's vendor).

Another factors

The best practice is to limit CPU writes, so, if you will find that you are limited by buffer updating, you probably need to re-view your approach. Do we need to update buffer each frame if we have 500 fps? User won't see difference if you reduce buffer update rate to 30-60 times per second (unbind buffer update from frame update). So, if your updating strategy is reasonable, you will likely never be CPU-limited and best approach is classic interleaving.

You can also consider re-designing your data pipeline, or even somehow prepare data offline (we call it "baking"), so you will not need to cope with non-interleaved buffers. That will be quite reasonable too.

Reduce memory footprint or increase performance?

Memory-to-performance tradeoff. This is the eternal question. Duplicate memory to take advantages of interleaving? Or not?

Answer is... "that depends". You are programming new CryEngine, targeting top GPUs with gigabytes of memory? Or you're programming for embedded systems of mobile platform, where memory resources slow and limited? Does 1 megabyte memory worth hassle at all? Or you have huge models, 100 MB each? We don't know.

It's all up to you to decide. But remember: there are no free candies. If you'll find memory economy worth performance loss, do it. Profile and compare to be sure.

Hope it helps somehow. Happy coding! =)

this is a very good summary of all the questions I'm asking myself. Doing a pre-bake in my case is not suitable; we want the file to open fast, and we don't want to waste time and memory to create a new point cache file that is an interleaved format of the original thing. Now I think you are right that I might need to profile it. Right now I only use interleaved buffers, so I have to use for loops to copy my new vertex positions into the buffer. I will try with non-interleaved buffer and see if a simple memcpy will work better. Also, scene size may vary A LOT. — Erunehtar, Nov 07 '13 at 16:03
Oh, and I forgot to mention, my renderer runs on classic desktop computers as well as mobile devices. In other words, I need to support both cases. Also the content of what is getting rendered is completely out of my control, it can be anything since the tool is used to view arbitrary 3D scenes, ranging from 1kb size to hundreds of megabytes (even gigabytes sometimes). — Erunehtar, Nov 07 '13 at 16:08

score 2 · Answer 2 · answered Nov 07 '13 at 16:51

Interleaved/Separate will mostly affect your Input Assembler stage (GPU side).

A perfect scenario for Interleaved is when your Buffer memory arrangement perfectly fits your vertex shader input. So your Input assembler can simply fetch the data.

In that case you'll be totally fine with interleaved, even tho testing with a large model (two versions of the same data, one interleaved, one separate), TimeStamp query didn't reported any major difference (some pretty minimal vertex processing and basic pixel shader).

Now having separate buffers makes it much easier to fine tune in case you use your geometry in different contexts.

Let's say you have Position/Normals/UV (like in your case).

Now you also have a shader in your pipeline that only requires Position (Shadow Map would be a pretty good example).

With separate buffers, you can simply create a new input layout which contains position only, And bind that buffer instead. Your IA stage has only to load that buffer. Best of all you can even do that dynamically using shader reflection.

If you bind Interleaved data, you will have some overhead due to the fact it has to load with a stride.

When I tested that one I had about 20% gain using Separate instead of Interleaved, which can be quite decent, but since this type of processing can be largely architecture dependent, don't take it for granted (NVidia 740M for testing).

So simply put, profile (a lot), and check which gives you the best balance between your GPU and CPU loads.

Please also note that the overhead from Input Assembler will decrease from the complexity of your shader, if you apply some heavy calculations + add some tessellation + some decent shading, the time difference between interleaved/non interleaved will become progressively meaningless.

score 0 · Answer 3 · answered Nov 06 '13 at 20:39

0

You should stick with interleaved buffers. Any other technique will require some form of indirection to your non-duplicated position buffer, which will cost you performance and cache efficiency.

answered Nov 06 '13 at 20:39

MooseBoys

6,641
1
19
43

Ok, then what if I want to change the vertex positions? I will need to code a _for loop_ to update the values instead of using a memcpy? Won't that make it much slower? – Erunehtar Nov 06 '13 at 21:05
@Deathicon Are you procedurally generating your vertex buffer? If not, you should bake the duplicated vertices and just do a copy. If so, hopefully you're rendering this geometry many times per generation, so the CPU overhead doesn't matter. If you're generating new geometry every frame, you're likely CPU bottlenecked and so the GPU efficiency of indirection doesn't really matter. – MooseBoys Nov 06 '13 at 21:19
I'm reading vertex positions from a point cache, which obviously returns me non-interleaved data (positions, normals and etc. are all in separate buffers) so I have to use a for loop to copy that data into the interleaved vertex buffer. I was just wondering if there was a better way to do this. – Erunehtar Nov 06 '13 at 21:30
@Deathicon This is a 3ds point cache? You're best bet is probably have some kind of offline pre-processing that converts the point cache file to a friendlier (i.e. flat) format. Point cache should be considered an author-time format, not a run-time format. – MooseBoys Nov 06 '13 at 21:36

Non-interleaved vertex buffers DirectX11

3 Answers3

How to

As of performance

Another factors

Reduce memory footprint or increase performance?

Linked

Related