int main()
{
const int STRIDE=2,SIZE=8192;
int i=0;
double u[SIZE][STRIDE];
#pragma vector aligned
for(i=0;i<SIZE;i++)
{
u[i][STRIDE-1]= i;
}
printf("%lf\n",u[7][STRIDE-1]);
return 0;
}
The compiler uses xmm registers here. There is stride 2 access and I want to make the compiler ignore this and do a regular load of memory and then mask alternate bits so I would be using 50% of the SIMD registers. I need intrinsics which can be used to load and then mask the register bitwise before storing back to memory
P.S: I have never done assembly coding before