Vulkan: upload 3 channel image to device

Question

Suppose there is a 3-channel image on the host side (either float or uint8), which needs to be transferred to a device image. vkCmdCopyBufferToImage is used for it. For the format of the device image I see two options:

Use R32G32B32A32_SFLOAT/R8G8B8A8_SNORM and reduce effective PCI bandwidth by 25% + deal with extra channel on host (extra CPU and/or RAM).
Use R32G32B32_SFLOAT/R8G8B8_SNORM, which is not widely supported.

Is there any way to stick with 4-channel on device, but fill it from 3-channel host buffer?

Nicol Bolas · Answer 1 · 2019-06-19T17:43:00.977

2

Yes: read pixel-by-pixel through your image, taking each 3-channel pixel and writing a 4-channel version of it. Then upload the properly formatted 4-channel data to Vulkan.

Vulkan isn't OpenGL; the only features it provides (for the most part) are hardware features: things the GPU does for you. Converting 3 channel data to the correct format isn't something the hardware does. So Vulkan doesn't do it; that's your job.

edited Jun 19 '19 at 17:43

answered Jun 19 '19 at 17:06

Nicol Bolas

449,505
63
781
982

That's what I currently resorted to, but it's still alternative 1 from original question, as following your solution, I am bound to transfer 4th channel. Thought that it may be a popular feature to implement it in driver. Guess I can copy buffer to buffer and then fill the image in compute shader afterwards, but I'm not sure if it would lead to significant speedup after all. – Vladimir Nazarenko Jun 20 '19 at 09:29

Krupip · Accepted Answer · 2019-06-20T15:23:43.300

NicolBolas is correct, it would require hardware to automatically transfer non 4channel images to 4channel image memory, or support 3 channel image textures. But instead of doing something slow like pixel by pixel copies through your image from the host, why not take full advantage of the throughput enhancements and use a compute shader to take your texture as a shader buffer, and read out the data in the correct format?

Here's what I mean:

you've got a 3channel image, but it is just data at this point.
You upload it to the device as just a buffer for now.
In a compute shader, read the channels from the image, and write out the result to a image directly on the device.

Example using images

As pointed out, you could also write directly to an image buffer, meaning you don't have to invoke a separate buffer copy to an image.

#version 450

// should be tightly packed if I read the spec correctly
// https://www.khronos.org/registry/OpenGL/specs/gl/glspec45.core.pdf
layout(std430, binding = 0) buffer in_3_channel 
{
    vec3 input_channel[ ];
};

layout (binding = 1, rgba32f) uniform image2D result_image;

layout (local_size_x = 256) in;

layout (binding = 2) uniform UBO 
{
    int image_size;
    int image_cols;
} ubo;


void main() 
{
    uint index = gl_GlobalInvocationID.x;
    if (index >= ubo.image_size){
        return; 
    }

    vec3 in_color = input_channel[index];
    vec4 out_color = vec4(in_color, 1.0);
    // using an image instead, we would write using this:
    int row = gl_GlobalInvocationID.x / image_cols;
    int col = gl_GlobalInvocationID.x % image_cols;
    ivec2 image_write_location = ivec2(row, col);
    imageStore(result_image, image_write_location, out_color);
}

Example using buffers:

#version 450

// should be tightly packed if I read the spec correctly
// https://www.khronos.org/registry/OpenGL/specs/gl/glspec45.core.pdf
layout(std430, binding = 0) buffer in_3_channel 
{
    vec3 input_channel[ ];
};

layout(std430, binding = 1) buffer out_4_channel{
    vec4 output_channel[ ];
};
//alternatively you could use a image buffer directly ex:
//
//layout (binding = 1, rgba32f) uniform image2D result_image;

layout (local_size_x = 256) in;

layout (binding = 2) uniform UBO 
{
    int image_size;
} ubo;


void main() 
{
    uint index = gl_GlobalInvocationID.x;
    if (index >= ubo.image_size){
        return; 
    }

    vec3 in_color = input_channel[index];
    vec4 out_color = vec4(in_color, 1.0);
    output_channel[index] = out_color;
    // using an image instead, we would write using this:
    // ivec2 image_write_location = ivec2(gl_GlobalInvocationID.x / image_cols, gl_GlobalInvocationID.x % image_cols);
    //imageStore(result_image, image_write_location, out_color);
}

If you want to then copy the buffer to a texture you can then just copy the data to an image like in this example here: https://vulkan-tutorial.com/Texture_mapping/Images

    //staging buffer is your new converted RGB -> RGBA buffer. 
    transitionImageLayout(textureImage, VK_FORMAT_R32G32B32A32_SFLOAT, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
        copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
    transitionImageLayout(textureImage, VK_FORMAT_R32G32B32A32_SFLOAT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);

Conclusion

Now you've gained the host->device bandwidth advantages, taken advantage of the compute cababilities of the GPU, and only touched the host on the initial transfer of the 3 channel memory.

Elegant solution indeed! I'd only propose to bind the texture to compute shader as a storage image, which would allow to fill it right away in shader. — Vladimir Nazarenko, Jun 20 '19 at 14:26

Vulkan: upload 3 channel image to device

2 Answers2

Example using images

Example using buffers:

Conclusion

Linked