How can I set values at multiple memory locations, atomically?

Question

the CUDA Programming Guide said that any atomic operation can be implemented using atomicCAS(), and gives an example of atomic double add:

__device__ float single(double *address,double val)
{
unsigned long long int *address_as_ull =(unsigned long long int*)address;
unsigned long long int assumed;
unsigned long long int old = *address_as_ull;

do
{
    assumed = old;
    old = atomicCAS(address_as_ull,assumed,__double_as_longlong(val + __longlong_as_double(assumed)));
}while(assumed !=old);
   return __longlong_as_double(old);
}

now,I face the problem that:

I want to write a function that can operate two variables address atomically.

for example: atomic add about two variable

input

double *address_1, int *address_2
double val_1,int val_2

result

*address_1 = *address_1+val_1;
*address_2 = *address_2+val_2;

how can I deal with the problem? thanks.

What is wrong with calling the atomic addition function twice? From the description in your question, the two operations are independent. — talonmies, Jan 11 '19 at 07:52
here is an simple example, actually in this case do the operation twice is ok ~ but in my situation, i need the atomic function with two address. do you have any suggestion? thanks~ — Xu Hui, Jan 11 '19 at 08:00
I think using a structure to store the two variable may be help. you see the double is transfer into ull, but I have no idea how to deal with the unsigned long long int and structure. — Xu Hui, Jan 11 '19 at 08:07

einpoklum · Accepted Answer · 2019-01-18T13:37:27.347

In general, you cannot do this. The hardware does not support atomic changes to multiple locations in memory. While you can circumvent this if both variables are small enough to fit into the size of a single atomic operation - this approach will fail if you have more than 8 bytes overall. You will encounter the "too much milk" problem.

One thing you could do is have some kind of synchronization protocol for accessing these two values. For example, you could use a mutex, which only one thread can obtain, to safely know that nobody else is changing the values while that thread is working on them. See: Avoid taking a long time to finish the 'too much milk' scenario.

Of course, this is quite expensive in a GPU setting. You had probably better do one of the following (by increasing order of favorability):

Use a pointer or an index into a larger array, and instead of changing the structure atomically, switch the pointer atomically. This resolves the concurrency issue, but makes accesses slower.
Change your algorithm so that accesses can be separated and don't have to happen atomically.
Change your algorithm further, to avoid having multiple threads writing to a single complex data structure.

Thanks you for your clarity~ The material you attached really helped me~ — Xu Hui, Jan 18 '19 at 06:52

score 1 · Answer 2 · answered Jan 11 '19 at 08:08

1

I think you're missing the point of the operation implemented here. In a+=b, the logical operation is a = a + b, but with CAS you avoid spurious changes to a between its read and its write. b is used once and not a problem.

In a = b + c, none of the values appear twice, so there's no need to protect against any changes in between.

answered Jan 11 '19 at 08:08

MSalters

173,980
10
155
350

thank you for your answer. I want to write a atomic code which can modify two variables atomically. do you have any suggestion? – Xu Hui Jan 11 '19 at 08:24
thank you for your clarify~ now i have the solution~ – Xu Hui Jan 11 '19 at 08:56

score 0 · Answer 3 · answered Jan 11 '19 at 08:54

thanks all the guys reply me! I now have the solution. we can combine the two variable into a structure. so we can transfer "two variable with two address" into "one structure with one address". here is the code:

#include <stdio.h>
struct pair_t
{
    float x;
    int y;
};

__device__ float single(double *address,double val)
{   

    unsigned long long int *address_as_ull =(unsigned long long int*)address;
    unsigned long long int assumed;
    unsigned long long int old = *address_as_ull;

    do
    {
        assumed = old;
        old = atomicCAS(address_as_ull,assumed,__double_as_longlong(val + __longlong_as_double(assumed)));
    }while(assumed !=old);
    return __longlong_as_double(old);
}



__device__ void myadd(pair_t *address, double val_1 ,int val_2)
{   
    union myunion
    {  
        pair_t p;
        unsigned long long int ull;
    };

    unsigned long long int *address_as_ull;
    address_as_ull = (unsigned long long int *)address;

    union myunion assumed;
    union myunion old_value;
    union myunion new_value;

    old_value.p = *(pair_t *)address_as_ull;

    do
    {
        assumed = old_value;
        // cirtical area begin--------------------
        new_value.p.x = assumed.p.x+val_1;
        new_value.p.y = assumed.p.y+val_2;
        // cirtical area end----------------------

        old_value.ull = atomicCAS(address_as_ull,assumed.ull,new_value.ull);
    }while(assumed.ull !=old_value.ull);
}


__global__ void kernel (pair_t *p)
{
    myadd(p,1.5,2);
}

int main()
{
    pair_t p;
    p.x=0;
    p.y=0;
    pair_t *d_p = NULL;
    cudaMalloc((pair_t **)&d_p, sizeof(pair_t));
    cudaMemcpy(d_p, &p, sizeof(pair_t), cudaMemcpyHostToDevice);

    kernel<<<100, 100>>>(d_p);

    cudaMemcpy(&p, d_p, sizeof(pair_t), cudaMemcpyDeviceToHost);

    cudaDeviceSynchronize();
    printf("x=%lf\n", p.x);
    printf("y=%d\n", p.y);
    cudaDeviceReset();
    return 0;
}

and the solution is

x=15000.000000
y=20000

now everything will be ok~

1. This is convoluted. You don't have to go through doubles at all. 2. More generally - this only works since the total size of the structure is small enough for you to act on it atomically. — einpoklum, Jan 11 '19 at 21:07

How can I set values at multiple memory locations, atomically?

3 Answers3