Let's say a function that add two vectors
void add256(int* r, int* p, int* q) {
for (int i=0; i<256; ++i) {
r[i] = p[i] + q[i];
}
}
Now if I know r
is either p
or not in the same array with p
, and same to q
, can restrict
help optimize the code with parallel add instruction?
I asked this question because, on GCC,
typedef struct { int x[256]; } int256;
void add256t(int256* r, int256* p, int256* q) {
for (int i=0; i<256; ++i) {
r->x[i] = p->x[i] + q->x[i];
}
}
can be optimized with the exactly assumed conditions and to my intended asm, but separating in different situation makes code a mess and asm separated situation doing same thing