It is mentioned here in the PTX documentation that bar.sync
and bar.arrive
barrier synchronization instructions can be used as below:
bar.sync a{, b};
bar.arrive a, b;
Where
Source operand a specifies a logical barrier resource as an immediate constant or register with value 0 through 15. Operand b specifies the number of threads participating in the barrier.
It also shows an example where a producer-consumer model is established using these instructions:
// Producer code places produced value in shared memory.
st.shared [r0],r1;
bar.arrive 0,64;
...
// Consumer code, reads value from shared memory
bar.sync 0,64;
ld.shared r1,[r0];
...
I do not quite get the purpose of operand b
in bar.arrive
. While such operand in bar.sync
can be used to control the number of threads involved in the barrier and wait until the thread count is reached, its use for bar.arrive
is not clear to me.