Efficient way to render a bunch of layered textures?

Question

What's the efficient way to render a bunch of layered textures? I have some semitransparent textured rectangles that I position randomly in 3D space and render them from back to front.

Currently I call d3dContext->PSSetShaderResources() to feed the pixel shader with a new texture before each call to d3dContext->DrawIndexed(). I have a feeling that I am copying the texture to the GPU memory before each draw. I might have 10-30 ARGB textures roughly 1024x1024 pixels each and they are associated across 100-200 rectangles that I render on screen. My FPS is OK at 100, but goes pretty bad around 200. I possibly have some inefficiencies elsewhere since this is my first semi-serious D3D code, but I strongly suspect this has to do with copying the textures back and forth. 30*1024*1024*4 is 120MB, which is a bit high for a Metro Style App that should target any Windows 8 device. So putting them all in there might be a stretch, but maybe I could at least cache a few somehow? Any ideas?

*EDIT - Some code snippets added

Constant Buffer

struct ModelViewProjectionConstantBuffer
{
    DirectX::XMMATRIX model;
    DirectX::XMMATRIX view;
    DirectX::XMMATRIX projection;
    float opacity;
    float3 highlight;
    float3 shadow;
    float textureTransitionAmount;
};

The Render Method

void RectangleRenderer::Render()
{
    // Clear background and depth stencil
    const float backgroundColorRGBA[] = { 0.35f, 0.35f, 0.85f, 1.000f };
    m_d3dContext->ClearRenderTargetView(
        m_renderTargetView.Get(),
        backgroundColorRGBA
        );

    m_d3dContext->ClearDepthStencilView(
        m_depthStencilView.Get(),
        D3D11_CLEAR_DEPTH,
        1.0f,
        0
        );

    // Don't draw anything else until all textures are loaded
    if (!m_loadingComplete)
        return;

    m_d3dContext->OMSetRenderTargets(
        1,
        m_renderTargetView.GetAddressOf(),
        m_depthStencilView.Get()
        );

    UINT stride = sizeof(BasicVertex);
    UINT offset = 0;

    // The vertext buffer only has 4 vertices of a rectangle
    m_d3dContext->IASetVertexBuffers(
        0,
        1,
        m_vertexBuffer.GetAddressOf(),
        &stride,
        &offset
        );

    // The index buffer only has 4 vertices
    m_d3dContext->IASetIndexBuffer(
        m_indexBuffer.Get(),
        DXGI_FORMAT_R16_UINT,
        0
        );

    m_d3dContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);

    m_d3dContext->IASetInputLayout(m_inputLayout.Get());

    FLOAT blendFactors[4] = { 0, };
    m_d3dContext->OMSetBlendState(m_blendState.Get(), blendFactors, 0xffffffff);

    m_d3dContext->VSSetShader(
        m_vertexShader.Get(),
        nullptr,
        0
        );

    m_d3dContext->PSSetShader(
        m_pixelShader.Get(),
        nullptr,
        0
        );

    m_d3dContext->PSSetSamplers(
        0,                          // starting at the first sampler slot
        1,                          // set one sampler binding
        m_sampler.GetAddressOf()
        );

    // number of rectangles is in the 100-200 range
    for (int i = 0; i < m_rectangles.size(); i++)
    {
        // start rendering from the farthest rectangle
        int j = (i + m_farthestRectangle) % m_rectangles.size();

        m_vsConstantBufferData.model = m_rectangles[j].transform;
        m_vsConstantBufferData.opacity = m_rectangles[j].Opacity;
        m_vsConstantBufferData.highlight = m_rectangles[j].Highlight;
        m_vsConstantBufferData.shadow = m_rectangles[j].Shadow;
        m_vsConstantBufferData.textureTransitionAmount = m_rectangles[j].textureTransitionAmount;


        m_d3dContext->UpdateSubresource(
            m_vsConstantBuffer.Get(),
            0,
            NULL,
            &m_vsConstantBufferData,
            0,
            0
            );

        m_d3dContext->VSSetConstantBuffers(
            0,
            1,
            m_vsConstantBuffer.GetAddressOf()
            );

        m_d3dContext->PSSetConstantBuffers(
            0,
            1,
            m_vsConstantBuffer.GetAddressOf()
            );

        auto a = m_rectangles[j].textureId;
        auto b = m_rectangles[j].targetTextureId;
        auto srv1 = m_textures[m_rectangles[j].textureId].textureSRV.GetAddressOf();
        auto srv2 = m_textures[m_rectangles[j].targetTextureId].textureSRV.GetAddressOf();
        ID3D11ShaderResourceView* srvs[2];
        srvs[0] = *srv1;
        srvs[1] = *srv2;

        m_d3dContext->PSSetShaderResources(
            0,                          // starting at the first shader resource slot
            2,                          // set one shader resource binding
            srvs
            );

        m_d3dContext->DrawIndexed(
            m_indexCount,
            0,
            0
            );
    }
}

Pixel Shader

cbuffer ModelViewProjectionConstantBuffer : register(b0)
{
    matrix model;
    matrix view;
    matrix projection;
    float opacity;
    float3 highlight;
    float3 shadow;
    float textureTransitionAmount;
};

Texture2D baseTexture : register(t0);
Texture2D targetTexture : register(t1);
SamplerState simpleSampler : register(s0);

struct PixelShaderInput
{
    float4 pos : SV_POSITION;
    float3 norm : NORMAL;
    float2 tex : TEXCOORD0;
};

float4 main(PixelShaderInput input) : SV_TARGET
{
    float3 lightDirection = normalize(float3(0, 0, -1));

    float4 baseTexelColor = baseTexture.Sample(simpleSampler, input.tex);
    float4 targetTexelColor = targetTexture.Sample(simpleSampler, input.tex);
    float4 texelColor = lerp(baseTexelColor, targetTexelColor, textureTransitionAmount);
    float4 shadedColor;
    shadedColor.rgb = lerp(shadow.rgb, highlight.rgb, texelColor.r);
    shadedColor.a = texelColor.a * opacity;
    return shadedColor;
}

score 2 · Accepted Answer · edited Apr 13 '17 at 12:18

2

As Jeremiah has suggested, you are not probably moving texture from CPU to GPU for each frame as you would have to create new texture for each frame or using "UpdateSubresource" or "Map/UnMap" methods.

I don't think that instancing is going to help for this specific case, as the number of polygons is extremely low (I would start to worry with several millions of polygons). It is more likely that your application is going to be bandwidth/fillrate limited, as your are performing lots of texture sampling/blending (It depends on tecture fillrate, pixel fillrate and the nunber of ROP on your GPU).

In order to achieve better performance, It is highly recommended to:

Make sure that all your textures have all mipmaps generated. If they don't have any mipmaps, It will hurt a lot the cache of the GPU. (I also assume that you are using texture.Sample method in HLSL, and not texture.SampleLevel or variants)
Use Direct3D11 Block Compressed texture on the GPU, by using a tool like texconv.exe or preferably the sample from "Windows DirectX 11 Texture Converter".

On a side note, you will probably get more attention for this kind of question on https://gamedev.stackexchange.com/.

edited Apr 13 '17 at 12:18

Community

1
1

answered Jun 02 '12 at 02:52

xoofx

3,682
1
17
32

I am calling UpdateSubresource for each of the 150 rectangles, but I only pass a few parameters there that define unique properties of each element. I also call PSSetShaderResources where I pass two textures that are blended to render each rectangle. Let me share some code. – Filip Skakun Jun 02 '12 at 04:44
Added some code. I do have a feeling this might be related to these fillrates. I was thinking the GPU should handle it, but since it is all textures with transparencies - I basically get 200-300 million texels processed every frame. I need to look into limiting the number of objects... :) – Filip Skakun Jun 02 '12 at 05:00
No doubt that the bandwidth/fillrate is the bottleneck. Also, if you can precompute at runtime part of the baseTexture/targetTexture blending (if you have a limited set of combination for example), that could be helpful, but you would loose compressed texture features. I don't fully understand the logic behind your formulas (for example, you are only using "texelColor.r" and "texelColor.a" only for blending between shadow/highlight and final opacity, without using the full range of the original texelColor.rgba). – xoofx Jun 03 '12 at 14:08
Thanks. The textures are basically monochrome clouds with alpha. I will need to verify that with my designer. I suppose maybe I could somehow encode them with two channels only. Maybe somehow decode the PNGs to DXGI_FORMAT_R8G8_UNORM? – Filip Skakun Jun 03 '12 at 21:43
You should use DXGI_FORMAT_BC5_UNORM (Check compressed file link above) and convert your PNG to DDS using this format (see texture converter tool. Authoring can still be done with png or whatever, but your asset should be converted when used from C++). To load DDS, check Windows 8 RP C++ samples. If your artist is working with Photoshop, NVidia provides an integrated tool to generate DDS (http://developer.nvidia.com/nvidia-texture-tools-adobe-photoshop) – xoofx Jun 04 '12 at 00:51
I'm using the BasicLoader::LoadTexture/DDSTextureLoader that uses WIC to decode the PNG. So converting to DDS from PNGs before packaging the assets could reduce the GPU overhead? Is BC5 UNORM going to drop the color information and leave it as grayscale+alpha? – Filip Skakun Jun 04 '12 at 05:20
Yes, compressed texture are decompressed at runtime, so it improves the usage of the bandwidth. Also, double check that your final DDS does have generated mipmaps. – xoofx Jun 04 '12 at 07:48
As stated in MSDN documentation, BC5 is a "Two color channels (8 bits:8 bits)". You will have to get red/green channels from sampling. A good introduction to block compression: http://www.reedbeta.com/blog/2012/02/12/understanding-bcn-texture-compression-formats/ – xoofx Jun 04 '12 at 08:21
Cool, but now I have to figure out how to convert it. :) – Filip Skakun Jun 04 '12 at 15:53
The link to the tool is in my original response, it's easy to use it, just a command line – xoofx Jun 05 '12 at 00:54

score 1 · Answer 2 · answered Jun 01 '12 at 20:38

I don't think you are doing any copying back and forth from GPU to system memory. You usually have to explicitly do that a call to Map(...), or by blitting to a texture you created in system memory.

One issue, is you are making a DrawIndexed(...) call for each texture. GPUs work most efficiently if you send it a bunch of work to do by batching. One way to accomplish this is to set n-amount of textures to PSSetShaderResources(i, ...), and do a DrawIndexedInstanced(...). Your shader code would then read each of the shader resources and draw them. I do this in my C++ DirectCanvas code here (SpriteInstanced.cpp). This can make for a lot of code, but the result is very efficient (I even do the matrix ops in the shader for more speed).

One other, maybe a lot easier way, is to give the DirectXTK spritebatch a shot.

I used it here in this project...only for a simple blit but it may be a good start to see the minor amount of setup needed to use the spritebatch.

Also, if possible, try to "atlas" your texture. For instance, try to fit as many "images" in a texture as possible and blit from them vs having a single texture for each.

I'm actually not getting data back to system RAM, but still worrying about feeding the textures to GPU. I could not find how to efficiently place my rectangle models in 3D space, so I basically track model/view/projection matrices and a few float parameters for each rectangle in my app and then for each rectangle - feed the GPU with my constant buffer (->UpdateSubresource, ->VSSetConstantBuffers, ->PSSetConstantBuffers) and textures (->PSSetShaderResources) then call call DrawIndexed. I did hear of instancing - I guess I'll need to read more and start using it. DXTK isn't 2D only? — Filip Skakun, Jun 01 '12 at 20:58

Efficient way to render a bunch of layered textures?

2 Answers2