Either equal or not overlapped

Question

Let's say a function that add two vectors

void add256(int* r, int* p, int* q) {
    for (int i=0; i<256; ++i) {
        r[i] = p[i] + q[i];
    }
}

Now if I know r is either p or not in the same array with p, and same to q, can restrict help optimize the code with parallel add instruction?

I asked this question because, on GCC,

typedef struct { int x[256]; } int256;
void add256t(int256* r, int256* p, int256* q) {
    for (int i=0; i<256; ++i) {
        r->x[i] = p->x[i] + q->x[i];
    }
}

can be optimized with the exactly assumed conditions and to my intended asm, but separating in different situation makes code a mess and asm separated situation doing same thing

Can you reword your question to make it clearer? If you'd have `void add256(int* restrict r, int* p, int* q)` for example, you're ensuring the compiler that the object pointed to by `r` is not aliased by any other pointer (since you are modifying it). See https://en.cppreference.com/w/c/language/restrict — gstukelj, Nov 03 '19 at 13:17
@gst Yet it's possible that `r==p`, in which case it's still possible to parallel — l4m2, Nov 04 '19 at 01:15
are they’re pointing to the same address or do they just contain the same values? If they’re pointing to the same address, using restrict will lead to undefined behavior. — gstukelj, Nov 04 '19 at 07:51

klutt · Answer 1 · 2019-11-04T13:22:06.977

0

When you use restrict, you make a promise to the compiler. Breaking that promise leads to undefined behavior.

The way I interpret your question is that the pointers are either the same or does not overlap at all. In that case, you can optimize like this:

void add256_rEQp(int restrict *r, int restrict *q) {
    for (int i=0; i<256; ++i) {
        r[i] += q[i];
    }
}

void add256(int* r, int* p, int* q) {
    if(r == p && r != q)
        add_256rEQp(r, q);
    else if( ...
    else {
        for (int i=0; i<256; ++i) {
            r[i] = p[i] + q[i];
        }
    }
}

But of course, you should run tests to see if it improves performance. After all, this does introduce a bit of overhead.

edited Nov 04 '19 at 13:22

answered Nov 04 '19 at 13:04

klutt

30,332
17
55
95

If I write https://gcc.godbolt.org/z/7XWFrJ , I get the correct code. It's likely that GCC always generate correct code in such situation, but IIRC it's UB, and I want to know a better way – l4m2 Nov 04 '19 at 13:35
@l4m2 `void add256(int256& r, int256& p, int256& q)` is not C. It's C++. – klutt Nov 04 '19 at 13:38
Same in C https://gcc.godbolt.org/z/KlD7bf language doesn't matter much – l4m2 Nov 04 '19 at 13:50

Either equal or not overlapped

1 Answers1