NicolBolas is correct, it would require hardware to automatically transfer non 4channel images to 4channel image memory, or support 3 channel image textures. But instead of doing something slow like pixel by pixel copies through your image from the host, why not take full advantage of the throughput enhancements and use a compute shader to take your texture as a shader buffer, and read out the data in the correct format?
Here's what I mean:
you've got a 3channel image, but it is just data at this point.
You upload it to the device as just a buffer for now.
In a compute shader, read the channels from the image, and write out the result to a image directly on the device.
Example using images
As pointed out, you could also write directly to an image buffer, meaning you don't have to invoke a separate buffer copy to an image.
#version 450
// should be tightly packed if I read the spec correctly
// https://www.khronos.org/registry/OpenGL/specs/gl/glspec45.core.pdf
layout(std430, binding = 0) buffer in_3_channel
{
vec3 input_channel[ ];
};
layout (binding = 1, rgba32f) uniform image2D result_image;
layout (local_size_x = 256) in;
layout (binding = 2) uniform UBO
{
int image_size;
int image_cols;
} ubo;
void main()
{
uint index = gl_GlobalInvocationID.x;
if (index >= ubo.image_size){
return;
}
vec3 in_color = input_channel[index];
vec4 out_color = vec4(in_color, 1.0);
// using an image instead, we would write using this:
int row = gl_GlobalInvocationID.x / image_cols;
int col = gl_GlobalInvocationID.x % image_cols;
ivec2 image_write_location = ivec2(row, col);
imageStore(result_image, image_write_location, out_color);
}
Example using buffers:
#version 450
// should be tightly packed if I read the spec correctly
// https://www.khronos.org/registry/OpenGL/specs/gl/glspec45.core.pdf
layout(std430, binding = 0) buffer in_3_channel
{
vec3 input_channel[ ];
};
layout(std430, binding = 1) buffer out_4_channel{
vec4 output_channel[ ];
};
//alternatively you could use a image buffer directly ex:
//
//layout (binding = 1, rgba32f) uniform image2D result_image;
layout (local_size_x = 256) in;
layout (binding = 2) uniform UBO
{
int image_size;
} ubo;
void main()
{
uint index = gl_GlobalInvocationID.x;
if (index >= ubo.image_size){
return;
}
vec3 in_color = input_channel[index];
vec4 out_color = vec4(in_color, 1.0);
output_channel[index] = out_color;
// using an image instead, we would write using this:
// ivec2 image_write_location = ivec2(gl_GlobalInvocationID.x / image_cols, gl_GlobalInvocationID.x % image_cols);
//imageStore(result_image, image_write_location, out_color);
}
If you want to then copy the buffer to a texture you can then just copy the data to an image like in this example here: https://vulkan-tutorial.com/Texture_mapping/Images
//staging buffer is your new converted RGB -> RGBA buffer.
transitionImageLayout(textureImage, VK_FORMAT_R32G32B32A32_SFLOAT, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
transitionImageLayout(textureImage, VK_FORMAT_R32G32B32A32_SFLOAT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);
Conclusion
Now you've gained the host->device bandwidth advantages, taken advantage of the compute cababilities of the GPU, and only touched the host on the initial transfer of the 3 channel memory.