3

When i create a 100 x 100 chunk of cubes in bevy it is only able to maintain like 10 fps.
Even if i replace the cubes with something more simple like planes i dont get any better performance out of it.
I benchmarked it with mangohud and it says, that my cpu and gpu are only sitting at about 20% usage.

Here is the code I use to generate a 32 x 32 chunk with OpenSimplex noise

    commands: &mut Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
    asset_server: Res<AssetServer>,
    seed: Res<Seed>,
) {
    let noise = OpenSimplex::new();

    commands
        .spawn(PbrBundle {
            mesh: meshes.add(Mesh::from(shape::Plane{ size: 1.0 })),
            material: materials.add(Color::rgb(0.5, 0.5, 1.0).into()),
            transform: Transform::from_translation(Vec3::new(0.0, 0.0, 0.0)),
            ..Default::default()
        })
        .with(Chunk)
        .with_children(|parent| {

        let texture_handle = asset_server.load("textures/dirt.png");
    
        for x in -32 .. 32 {
            for z in -32 .. 32 {
                let y = (noise.get([
                    ( x as f32 / 20. ) as f64, 
                    ( z as f32 / 20. ) as f64,
                    seed.value,
                ]) * 15. + 16.0) as u32;


                parent
                    .spawn(PbrBundle {
                        mesh: meshes.add(Mesh::from(shape::Cube{ size: 1.0 })),
                        material: materials.add(StandardMaterial { albedo: Color::rgba(1.0, 1.0, 1.0, 1.0), albedo_texture: Some(texture_handle.clone()), ..Default::default() }),
                        transform: Transform::from_translation(Vec3::new(x as f32, y as f32, z as f32)),
                        ..Default::default()
                    })
                .with(Cube);
            }
        }
    });
}

But 32 x 32 is the absolute maximum for a playable experience. What do I have to do, to be able to draw multiple chunks at the same time?

System specs:
cpu: Intel Core i7-6820HQ CPU @ 2.70GHz
igpu:Intel HD Graphics 530
dgpu: Nvidia Quadro M2000M

But when offloading to the more powerfull dgpu I dont get any better performance.

Bruno Wallner
  • 383
  • 5
  • 13
  • 1
    You can use something like that: https://github.com/bonsairobo/building-blocks or at least get some inspiration from there. – frankenapps Mar 13 '21 at 09:38

4 Answers4

2

Some optimizations that are immediately visible:

  • Convert your nested for-loop algorithm into a single for-loop.
    • It's more cache friendly.
    • Use math to split the now-single index into x/y/z values to determine position.
  • Hidden surface removal.
    • During mesh creation, instead of creating a whole new cube to add to the mesh (6 faces, 12 triangles, 24 verts) only add the faces (2 triangles) to the mesh that are actually visible. I.e. those that do not have a neighboring opaque (not air) block in that direction.
  • Use Indexed drawing instead of Vertex-based
  • Use a TextureAtlas.
    • Use one big texture for every every cube instead of a single texture per cube.
Casey
  • 10,297
  • 11
  • 59
  • 88
  • 1
    Converting a nested loop into a non-nested one does not improve performance nor changes the complexity of the algorithm. – Acorn Mar 13 '21 at 02:18
  • @Acorn It depends on the dimensions of the loops, in this case they are equal, however a flat loop is more cache friendly than a nested loop. – Casey Mar 13 '21 at 02:24
  • 1
    No, it doesn't depend on the dimensions being equal or not, I am not sure what you are trying to say. Whether a loop is nested or not it does not matter either for caches, locality of accesses does. – Acorn Mar 13 '21 at 02:41
  • 1
    The texture atlas suggestion is also strange. What OP should be doing is not creating an atlas but reusing the texture. – Acorn Mar 13 '21 at 02:44
  • I reduced the faces in mesh creation by only spawning planes instead of cubes, so instead of rendering 1200 triangles it now only renders 200 triangles.But this does not help because I dont gain a single frame more per second. Could this be some kind of limitation of the engine? (bevy) – Bruno Wallner Mar 13 '21 at 13:50
  • @BrunoWallner Probably. The documentation says it is very new and unstable. – Casey Mar 13 '21 at 14:38
  • clone the mesh and material instead of adding them every time. The handles that `meshes.add` returns are clonable. That's going to be a lot less work for bevy. – Squirrel Jun 16 '22 at 21:07
2

The hidden surface removal and not naively generating a cube mesh per voxel is the real answer; however, there is a piece of low hanging fruit to optimize that might explain why the performance is so low: you only need one mesh and one material to render all those cubes.

    commands: &mut Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
    asset_server: Res<AssetServer>,
    seed: Res<Seed>,
) {
    let noise = OpenSimplex::new();

    commands
        .spawn(PbrBundle {
            mesh: meshes.add(Mesh::from(shape::Plane{ size: 1.0 })),
            material: materials.add(Color::rgb(0.5, 0.5, 1.0).into()),
            transform: Transform::from_translation(Vec3::new(0.0, 0.0, 0.0)),
            ..Default::default()
        })
        .with(Chunk)
        .with_children(|parent| {

        let texture_handle = asset_server.load("textures/dirt.png");
        let mesh_handle = meshes.add(Mesh::from(shape::Cube{ size: 1.0 }));
        let material_handle = materials.add(StandardMaterial { albedo: Color::rgba(1.0, 1.0, 1.0, 1.0), albedo_texture: Some(texture_handle.clone()), ..Default::default() });
        for x in -32 .. 32 {
            for z in -32 .. 32 {
                let y = (noise.get([
                    ( x as f32 / 20. ) as f64, 
                    ( z as f32 / 20. ) as f64,
                    seed.value,
                ]) * 15. + 16.0) as u32;


                parent
                    .spawn(PbrBundle {
                        mesh: mesh_handle,
                        material: material_handle,
                        transform: Transform::from_translation(Vec3::new(x as f32, y as f32, z as f32)),
                        ..Default::default()
                    })
                .with(Cube);
            }
        }
    });
}
0

It is actually the engines fault, but it will improve with the 0.5.0 release.

Bruno Wallner
  • 383
  • 5
  • 13
  • 2
    I disagree. This example: https://github.com/bonsairobo/building-blocks/tree/main/examples/array_texture_materials runs with more than 1000fps on my GTI1060 with bevy 0.4. Not quite 100000 voxels but still pretty good. Are you running it with `cargo run --release`? – frankenapps Mar 15 '21 at 11:59
0

Use meshing algorithms, like greedy meshing. block-mesh is a good rust library that already has a couple of algorithms integrated for voxels.

also see vx-bevy to have an idea about voxel engine creation.