I try to find a proper material that clearly explains the different ways to write C/C++ source code that can be vectorized by the Intel compiler using array notation and elementary functions. All the materials online take trivial examples: saxpy, reduction etc. But there is a lack of explanation on how to vectorize a code that has conditional branching or contains a loop with loop-dependence.
For an example: say there is a sequential code I want to run with different arrays. A matrix is stored in major row format. The columns of the matrix is computed by the compute_seq() function:
#define N 256
#define STRIDE 256
__attribute__((vector))
inline void compute_seq(float *sum, float* a) {
int i;
*sum = 0.0f;
for(i=0; i<N; i++)
*sum += a[i*STRIDE];
}
int main() {
// Initialize
float *A = malloc(N*N*sizeof(float));
float sums[N];
// The following line is not going to be valid, but I would like to do somthing like this:
compute_seq(sums[:],*(A[0:N:1]));
}
Any comments appreciated.