I have something like that :
for (b=from; b<to; b++)
{
for (a=from2; a<to2; a++)
{
dest->ac[b] += srcvec->ac[a] * srcmatrix->weight[a+(b+from)*matrix_width];
}
}
that i'd like to parallelize using cilk. I have written the following code :
for ( b=from; b<to; b++)
{
dest->ac[b] =+ __sec_reduce_add(srcvec->ac[from2:to2-from2] * (srcmatrix->weight+(b*matrix_width))[from2:to2-from2]);
}
but the thing is, I could use a cilk_for on the primary loop, but if the reduce operation is already spawning thread, won't the cilk_for augment the thread overhead, and slow the whole thing down ? And should I add restrict to dest and src args to further help the compiler ? or is it implicit in this case ?
(ps: I can't try the code right now because of
internal compiler error: in find_rank, at c-family/array-notation-common.c:244
on
neu1b->ac[0:layer1_size]=neu1->ac[0:layer1_size];
that i'am trying to solve also.)