2

I'm working on an assignment where I have to convert a snippet of C code to inline assembly. The code is part of a program that renders a Julia fractal.

I've tested the outputs of both code snippets and they match exactly, yet my program still outputs a different image (a proper Julia fractal for the C code, a flat pink screen for the inline assembly code).

This is the starting part of the function as well as the return

COLORREF render_point(const double &a, 
                      const double &b, int N) {
  double cRe = -0.5;
  double cIm = -0.05;
  double x(a), y(b);
  double norm = x*x+y*y;
  int n;
  double three = 3.0;

  (loop goes here)

  return HSVtoRGB(n % 256, 255 , 255 *(n<N));
}

Here's the C code

for (n = 0; norm < 4.0 && n < N; ++n) 
{
    double old_x = x;
    double old_y = y;

    x = (old_x * old_x * old_x) - (3 * old_y * old_y * old_x) + cRe;
    y = (3 * old_y * old_x * old_x) - (old_y * old_y * old_y) + cIm;

    norm = x*x+y*y;
}

and the inline assembly code:

for (n = 0; norm < 4.0 && n < N; ++n) 
  {
    __asm {
        // Create (old_x * old_x * old_x)
        fld x;
        fmul x;
        fmul x;

        // Create (3 * old_y * old_y * old_x)
        fld three;
        fmul y;
        fmul y;
        fmul x;

        // Create the full equation for x
        fsubp st(1), st(0);
        fadd cRe;

        // Create (3 * old_y * old_x * old_x) + cIm
        fld three;
        fmul y;
        fmul x;
        fmul x;
        fadd cIm;

        // Create (old_y * old_y * old_y)
        fld y;
        fmul y;
        fmul y;

        fsubp st(1), st(0); // Create the full equation for y

        fst y;              // Store in y to use for next loop
        fmul st(0), st(0);  // Get y*y

        fxch st(1);         // Swap places of y*y with newly calculated x
        fst x;              // Store in x to use for next loop

        fmul st(0), st(0);  // Get x*x

        faddp st(1), st(0); // Get x*x + y*y
        fst norm;           // Set loop variable
    }
  }

Is there a difference between the two loops that might result in a differing output in the program?

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
ozma
  • 349
  • 1
  • 3
  • 10
  • Have you stepped through with the debugger and observed? – Michael Petch Sep 24 '17 at 04:57
  • 3
    At the end of your assembly loop, you still have one value on the FPU stack. The last instruction should be `fstp norm`. – 1201ProgramAlarm Sep 24 '17 at 05:22
  • @1201ProgramAlarm Yeah, that fixed it! I didn't know I couldn't leave things on the FPU stack. Case closed, I suppose. – ozma Sep 24 '17 at 05:35
  • You should *really* write the whole loop in asm. Storing at the end of one asm block reloading at the start of the next iteration introduces an extra ~6 cycles of latency (http://agner.org/optimize/). Also, holy crap every instruction you're using has a memory operand, instead of `fld y` / `fld st(0)` / `fmul st(0)` / `fmulp st(0)`. (cubing is always clunky with destructive instructions. With AVX, you could `vmovsd xmm0, y` / `vmulsd xmm1, xmm0,xmm0` / `vmulsd xmm1, xmm1,xmm0`). With AVX+FMA, you could make this a lot more efficient. – Peter Cordes Sep 24 '17 at 11:36

1 Answers1

2

As 1201ProgramAlarm mentioned in a comment, just had to pop the remaining value norm off the FPU at the end of each loop iteration.

ozma
  • 349
  • 1
  • 3
  • 10