2

Please consider the following instruction:

mpyf3 *ar0+, *ar1+, r0 || addf3 r0, r1, r1

The first instruction is equal to r0 = *ar0++ * *ar1++ and the second instruction is equal to r1 = r0 + r1. However, what's the value of r0 in the second instruction? There are two options:

  1. r0 is the value of r0 before the parallel instruction
  2. r0 is the result of the first instruction

What's correct?


Moreover, how would I parallelize a simple filter such as the following one

void emg_filter(int const* a0, int* a1)
{
    int const N = ...;
    int result = 0;
    for (; N > 0; --N)
        result += *a0++;
    *a1 = result/N;
}
0xbadf00d
  • 17,405
  • 15
  • 67
  • 107

3 Answers3

0

According to the TMS320C3x Users's Guide documentation for the MPYF3||ADDF3 instruction:

A floating-point multiplication and a floating-point addition are performed in parallel. All registers are read at the beginning and loaded at the end of the execute cycle. If one of the parallel operations (MPYF3) reads from a register and the operation being performed in parallel (ADDF3) writes to the same register, then MPYF3 accepts the contents of the register as input before it is modified by the ADDF3.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
0

Both are done in parallel, so it is (1.): the r0 in addf3 r0, x, x is NOT the output of mpyf3 x, x, r0. Typically you use it in a loop, so if you repeat this instruction several times, the addf3 is using the r0 that was computed during the previous cycle by multf3.

Not sure what you question is with the loop, but you should be able to use a repeat single instruction (RPTS). Also, it's better to multiply by the constant (1/N), rather than dividing by N.

nicolas
  • 3,120
  • 2
  • 15
  • 17
-2

Then I think it would be the problem of precedence, in most case they are left-to-right, and I think multiply would be calculated first, because in C, multiply is more prior than add instruction by one level.

Steve Fan
  • 3,019
  • 3
  • 19
  • 29
  • 1
    We're not talking about precedence here or whether the example is sensible or not. It's a question about how parallel instructions are implemented in TMS320C3x assembler. – 0xbadf00d Jul 19 '14 at 16:46