Please consider the following instruction:
mpyf3 *ar0+, *ar1+, r0 || addf3 r0, r1, r1
The first instruction is equal to r0 = *ar0++ * *ar1++
and the second instruction is equal to r1 = r0 + r1
. However, what's the value of r0
in the second instruction? There are two options:
r0
is the value ofr0
before the parallel instructionr0
is the result of the first instruction
What's correct?
Moreover, how would I parallelize a simple filter such as the following one
void emg_filter(int const* a0, int* a1)
{
int const N = ...;
int result = 0;
for (; N > 0; --N)
result += *a0++;
*a1 = result/N;
}