1

Let's say a function that add two vectors

void add256(int* r, int* p, int* q) {
    for (int i=0; i<256; ++i) {
        r[i] = p[i] + q[i];
    }
}

Now if I know r is either p or not in the same array with p, and same to q, can restrict help optimize the code with parallel add instruction?


I asked this question because, on GCC,

typedef struct { int x[256]; } int256;
void add256t(int256* r, int256* p, int256* q) {
    for (int i=0; i<256; ++i) {
        r->x[i] = p->x[i] + q->x[i];
    }
}

can be optimized with the exactly assumed conditions and to my intended asm, but separating in different situation makes code a mess and asm separated situation doing same thing

l4m2
  • 1,157
  • 5
  • 17
  • Can you reword your question to make it clearer? If you'd have `void add256(int* restrict r, int* p, int* q)` for example, you're ensuring the compiler that the object pointed to by `r` is not aliased by any other pointer (since you are modifying it). See https://en.cppreference.com/w/c/language/restrict – gstukelj Nov 03 '19 at 13:17
  • @gst Yet it's possible that `r==p`, in which case it's still possible to parallel – l4m2 Nov 04 '19 at 01:15
  • 1
    are they’re pointing to the same address or do they just contain the same values? If they’re pointing to the same address, using restrict will lead to undefined behavior. – gstukelj Nov 04 '19 at 07:51

1 Answers1

0

When you use restrict, you make a promise to the compiler. Breaking that promise leads to undefined behavior.

The way I interpret your question is that the pointers are either the same or does not overlap at all. In that case, you can optimize like this:

void add256_rEQp(int restrict *r, int restrict *q) {
    for (int i=0; i<256; ++i) {
        r[i] += q[i];
    }
}

void add256(int* r, int* p, int* q) {
    if(r == p && r != q)
        add_256rEQp(r, q);
    else if( ...
    else {
        for (int i=0; i<256; ++i) {
            r[i] = p[i] + q[i];
        }
    }
}

But of course, you should run tests to see if it improves performance. After all, this does introduce a bit of overhead.

klutt
  • 30,332
  • 17
  • 55
  • 95
  • If I write https://gcc.godbolt.org/z/7XWFrJ , I get the correct code. It's likely that GCC always generate correct code in such situation, but IIRC it's UB, and I want to know a better way – l4m2 Nov 04 '19 at 13:35
  • @l4m2 `void add256(int256& r, int256& p, int256& q)` is not C. It's C++. – klutt Nov 04 '19 at 13:38
  • Same in C https://gcc.godbolt.org/z/KlD7bf language doesn't matter much – l4m2 Nov 04 '19 at 13:50