5

I'm trying to decide what the most efficient way to render a bunch of cubes with different textures in a Minecraft-like game is.

I discovered instanced rendering. What I've done is I've created a single "cube model" which stores all the vertices, normals, and texture coordinates for a cube, I create an array buffer out of that and pass it to the GPU once. Then I created a array of (translation vector, texture index) structs and I use instanced rendering to redraw the same cube over and over, each time translated and with the appropriate texture.

(Hopefully Notch doesn't mind me using his textures until I make my own)

The problem is that not all 6 sides will always have the same texture, and I'm trying to figure out how I can get them to be different for each block type. The two solutions I've come up with are:

  1. Use a different Model for each block type. This way I can specify different texture coordinates on each of the vertices. I can still use instanced rendering, but I'd have do a separate pass for each block type.
  2. Pass 6 "texture indexes" (1 for each face) instead of 1 for every block.

The 2nd solution requires passing a lot more, possibly redundant, data. I'm not sure how great the benefits of instanced rendering are... so I don't know if it would be better to do, say, up to 256 "passes" (1 for each block type), or "one big pass" with all the data, and render every block in one shot.

Or perhaps there's another method I'm not aware of?

Community
  • 1
  • 1
mpen
  • 272,448
  • 266
  • 850
  • 1,236

4 Answers4

8

I don't think you can do it efficiently with instances. Vast majority of faces/cubes is never visible and you can save a lot of time by not rendering them. Unfortunately that makes every cube a different object.

The standard solution (and how it's done in Minecraft) is to divide your terrain into sectors . Compute which faces are visible and upload them to GPU. When a cube changes you just need to re-upload it's sector. When rendering a sector you just draw primitives without any other computations.

You can do something based on sparse voxel octrees. It's much more work, but you would be able to efficiently and accurately tell which parts of your world are visible.

Piotr Praszmo
  • 17,928
  • 1
  • 57
  • 65
  • Good point, but here, I'd give a simple "instance culling" pass using geometry shaders and transform feedback a chance first, which can be combined with octrees and partitions later on. – Sam Feb 05 '12 at 20:28
  • Things suddenly got a lot more complicated. I was going to decide how to do the textures first, and then look into pruning methods. I thought I'd end up pruning the non-visible cubes, but I guess doing it based on faces makes more sense now that you mention it. This raises a few new questions though. That essentially means I'm doing the culling/pruning on the CPU, no? Wouldn't that a) be slower, and b) reimplementing all the logic the GPU normally does for me? With regard to octrees, are you suggesting I keep the terrain data in such a data structure or would I use that for pruning purposes... – mpen Feb 05 '12 at 20:47
  • ...only? [This article](http://0fps.wordpress.com/2012/01/14/an-analysis-of-minecraft-like-engines/) suggests that octrees are actually inefficient for storing the terrain as each access now has the overhead of doing through `m` layers of the octree. Lastly, if a prune out the majority of the cubes/faces, what happens when I want to do lighting? Won't some relevant data be missing from the GPU/GLSL? i.e., a face might not be visible to the viewer, but it can still block light? Or are my dreams of nice shadows on this scale unrealistic to begin with? – mpen Feb 05 '12 at 20:48
  • 1
    When it comes to light and shadow, which require individual transformed render passes, instance culling can be applied for each. As long as lights and their corresponding lighted cubes dont move/change, the culling / shadow map pass needs not to be redone. I don't think its unrealistic. To lookup, in which light cone or partition a cube changed (user interactions), the CPU is the right location to do this and initiate GPU supported processing of that section. – Sam Feb 05 '12 at 21:06
  • I suggest to implement instanced rendering within a reusable 'section' class, which can have a shape required for any intersection tests required for cubes in lights or partitions. This section class should have its own instance array containing each cube it intersects. This will be filled on init and modified if a cube gets out or in – Sam Feb 05 '12 at 21:16
  • @Mark, You only remove the faces which are never visible. So, you don't have to re-upload them when camera rotates. You can definitely move culling pass to geometry shader, you just need to find an efficient way to send the cubes to GPU. Maybe 3D textures? You are right about octrees, they preform well if the world is very irregular, which is not the case in Minecraft. – Piotr Praszmo Feb 05 '12 at 22:03
  • @Banthar: If I should prune the faces that are never visible on the CPU (I guess just the ones buried underground)...then I should probably convert to some face-based structure rather than a cube-based one (using a 3D array in C# right now)? As for an efficient way of sending the cubes, what's wrong with just sending 1 cube model + the translation instance data via `glVertexAttribDivisor` and `glDrawElementsInstancedBaseVertex`? As I understand it, those were designed for this purpose and texture or uniform buffers are the old school way of doing this? (I never learned either of those) – mpen Feb 05 '12 at 22:44
  • I just read through most of [this article](http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/) which talks about using the geometry shader for culling, but what I gather from it is that he just sends down a single central point + an extents vector for each model instance to the vertex shader so that each vertex really represents the location of an object, then he computes bounding boxes for each of them and prunes out any ones that don't fit within the viewing frustum. I can see how this would help with high-poly models, but my models are already as simple as they get. – mpen Feb 05 '12 at 22:48
  • i.e., my cubes are already bounding boxes essentially, so I can't really send any less data down than I already am...well maybe a little bit, but... I don't think I'm going to get much benefit out of it. Plus, in my example 90% of the blocks are within the viewing frustum, it's the ones that are buried that I need to cull. Hrm.. if I just do one pass on the CPU side of things and check the 6 main sides, I can prune it out that way. I think I see now. But pruning wasn't the goal of this SO question...I'll ask about that later if I get stuck again. – mpen Feb 05 '12 at 22:54
  • @Mark, arrays are very efficient. If you transform array into individual cubes you need to add information about its neighbors. That's 6x more information as individual cubes and 24x more information as individual faces. The culling has to be done as part of this transformation or you will end up transfering much more data than necessary. – Piotr Praszmo Feb 05 '12 at 23:08
  • The idea is to send 3d texture of your terrain and indexes of cubes you want drawn, to GPU. Geometry shader will decide which, if any, sides can be seen (i.e. are external). If you send individual cubes + 6 neighbors to GPU, thats 6x more data. It's likely that the cost of this transfer will be greater than benefit of doing the culling on GPU. – Piotr Praszmo Feb 05 '12 at 23:23
3

I know this question is almost two years old, but may I would make a 3D texture which stores all of the individual textures, where the z texture coordinate would be sorta like the block ID. With the 3D texture, you can now bind all of your individual block textures at once, meaning you can use instanced rendering to pass in you transformations along with a blockID to grab the correct block texture for the 3D sampler.

Nico Cvitak
  • 471
  • 4
  • 7
1

Really late answer, but somebody's bound to need it.

Have a method in the main block class that returns the texture, with a parameter for the face. In the individual classes that need multiple textures, override this method and use a switch case or a series of if/else statements.

This is what the method would look like in the block class:

public int getBlockTexture(int face){
     if(face = top){
         return grass top
     } else if(face = bottom){
         return grass bottom
     } else {
         return grass side
     }
}

As for how you use this in the renderer, grab the texture before you render each face. Similar to how you do culling.

1

On my nVidia 8600M GT I found out, that instancing performs best "in the middle" with moderate vertex and instance counts, but I ended up instancing a couple of vertices thousands of times to eliminate redundant data along with the effort to update it.

I'd choose 2, using a texture array along with a single, instanced cube in the vertex array and select the face texture using your texture indices of your 'per instance array', where the 6 indices may even be packed into few integers. For supplying instance attribs, GL_ARB_instanced_arrays can also be of use, where one does not need to access a buffer using gl_InstanceID (predictable and therefore faster in most cases). If you need to have instance specific texture coords, I'd bind an additional per instance and vertex texture coord array, along with an accordingly modified shader.

Sam
  • 7,778
  • 1
  • 23
  • 49
  • I wasn't using `gl_InstanceID` -- you'd use that with a uniform holding your texture indices I suppose? Didn't think of doing it that way. I'm using `glDrawElementsInstancedBaseVertex`, and putting the instance data in the GL_ARRAY_BUFFER with usage hint GL_DYNAMIC_DRAW. Where does "ARB_instanced_arrays" come into play? Not sure what that is. – mpen Feb 05 '12 at 20:18
  • Your choice of "2" seems at odds with your first paragraph. "1" would be the "more middle" solution, no? 1 uses instances, just with smaller chunks of data in each pass. You'd go with 2, even with say, 200K cubes on screen? – mpen Feb 05 '12 at 20:20
  • EDIT: Yes, I'd do benchmarks first.... You will run out of uniform components sooner or later, so I'd use "texture buffer objects". – Sam Feb 05 '12 at 20:25
  • The alternative to using texture buffer objects with gl_InstanceID are aforesaid instanced arrays. The 8600M GT is rather old, new cards are better at hardware supported instancing, where less vertices may be advantageous due to caching etc.. I can't give a clear hint here, but I'd postpone this for benchmarks and optimization later on, as it shouldn't break your basic concept. – Sam Feb 05 '12 at 20:46
  • Well I'm about to embark down one path or the other. I'd rather not waste too much time on benchmarking this early on until it becomes a barrier, as you said. The problem though is I'm already down to about 60 FPS and even lower (~20) at full screen. This is before any pruning though... that's another issue I haven't worked out yet. – mpen Feb 05 '12 at 20:54