I'm trying to follow along the "hello compute" example from wgpu on Windows 10 (with some minor modifications, mainly gutting the shader so it does basically no actual computing), but when I read the buffer at the end, it's always zeroed out.
This is the shader I'm trying to run, it compiles fine and I think it's correct
[[block]]
struct Numbers
{
data: [[stride(4)]] array<u32>;
};
[[group(0), binding(0)]]
var<storage, read_write> numbers: Numbers;
[[stage(compute), workgroup_size(1)]]
fn main()
{
numbers.data[0] = numbers.data[0] + u32(1);
numbers.data[1] = numbers.data[1] + u32(1);
numbers.data[2] = numbers.data[2] + u32(1);
}
As for the wgpu code, it follows the tutorial quite closely:
I get the instance, device, and queue
let instance = Instance::new(Backends::PRIMARY);
let adapter = block_on(instance
.request_adapter(&RequestAdapterOptions
{
power_preference: PowerPreference::default(),
compatible_surface: None,
}))
.unwrap();
let (device, queue) = block_on(adapter
.request_device(&Default::default(), None))
.unwrap();
Compile the shader and make a pipeline:
let shader = device.create_shader_module(&ShaderModuleDescriptor
{
label: Some("shader"),
source: ShaderSource::Wgsl(shader_src.into()),
});
let pipeline = device.create_compute_pipeline(&ComputePipelineDescriptor
{
label: None,
layout: None,
module: &shader,
entry_point: "main",
});
Make the staging and storage buffer. The dbg!(size)
prints 12, which should be correct for a 3-length array for 4-byte u32s.
let buffer = [1u32, 2, 3];
let size = std::mem::size_of_val(&buffer) as u64;
dbg!(size);
let staging_buffer = device.create_buffer(&BufferDescriptor
{
label: None,
size: size,
usage: BufferUsages::MAP_READ | BufferUsages::COPY_DST,
mapped_at_creation: false,
});
let storage_buffer = device.create_buffer_init(&BufferInitDescriptor
{
label: Some("storage buffer"),
contents: cast_slice(&buffer),
usage: BufferUsages::STORAGE
| BufferUsages::COPY_DST
| BufferUsages::COPY_SRC,
});
set up the bind group:
let bg_layout = pipeline.get_bind_group_layout(0);
let bind_group = device.create_bind_group(&BindGroupDescriptor
{
label: None,
layout: &bg_layout,
entries: &[BindGroupEntry
{
binding: 0,
resource: storage_buffer.as_entire_binding(),
}]
});
Get the encoder and create the compute pass. The copy_buffer_to_buffer
should copy the storage buffer to the staging buffer so I can read it at the end.
let mut encoder = device.create_command_encoder(&CommandEncoderDescriptor
{
label: None,
});
{
let mut cpass = encoder.begin_compute_pass(&ComputePassDescriptor
{
label: None
});
cpass.set_pipeline(&pipeline);
cpass.set_bind_group(0, &bind_group, &[]);
cpass.dispatch(1, 1, 1);
}
encoder.copy_buffer_to_buffer(
&storage_buffer, 0,
&staging_buffer, 0,
size);
queue.submit(Some(encoder.finish()));
And then submit the compute pass and block for the result:
let buf_slice = staging_buffer.slice(..);
let buf_future = buf_slice.map_async(MapMode::Read);
device.poll(Maintain::Wait);
if let Ok(()) = block_on(buf_future)
{
let data = buf_slice.get_mapped_range();
let result = cast_slice::<u8, u32>(&data).to_vec();
drop(data);
staging_buffer.unmap();
println!("{:?}", result);
}
else
{
println!("error");
}
The error case isn't reached, and the program terminates with no errors, but the result is always printed [0, 0 ,0]
, when it should be [2, 3, 4]
.
What am I doing wrong?