11

A few years back, I got an assignment at school, where I had to parallelize a Raytracer.
It was an easy assignment, and I really enjoyed working on it.

Today, I felt like profiling the raytracer, to see if I could get it to run any faster (without completely overhauling the code). During the profiling, I noticed something interesting:

    // Sphere.Intersect
    public bool Intersect(Ray ray, Intersection hit)
    {
        double a = ray.Dir.x * ray.Dir.x +
                   ray.Dir.y * ray.Dir.y +
                   ray.Dir.z * ray.Dir.z;
        double b = 2 * (ray.Dir.x * (ray.Pos.x - Center.x) +
                        ray.Dir.y * (ray.Pos.y - Center.y) +
                        ray.Dir.z * (ray.Pos.z - Center.z));
        double c = (ray.Pos.x - Center.x) * (ray.Pos.x - Center.x) +
                   (ray.Pos.y - Center.y) * (ray.Pos.y - Center.y) +
                   (ray.Pos.z - Center.z) * (ray.Pos.z - Center.z) - Radius * Radius;

        // more stuff here
    }

According to the profiler, 25% of the CPU time was spent on get_Dir and get_Pos, which is why, I decided to optimize the code in the following way:

    // Sphere.Intersect
    public bool Intersect(Ray ray, Intersection hit)
    {
        Vector3d dir = ray.Dir, pos = ray.Pos;
        double xDir = dir.x, yDir = dir.y, zDir = dir.z,
               xPos = pos.x, yPos = pos.y, zPos = pos.z,
               xCen = Center.x, yCen = Center.y, zCen = Center.z;

        double a = xDir * xDir +
                   yDir * yDir +
                   zDir * zDir;
        double b = 2 * (xDir * (xPos - xCen) +
                        yDir * (yPos - yCen) +
                        zDir * (zPos - zCen));
        double c = (xPos - xCen) * (xPos - xCen) +
                   (yPos - yCen) * (yPos - yCen) +
                   (zPos - zCen) * (zPos - zCen) - Radius * Radius;

        // more stuff here
    }

With astonishing results.

In the original code, running the raytracer with its default arguments (create a 1024x1024 image with only direct lightning and without AA) would take ~88 seconds.
In the modified code, the same would take a little less than 60 seconds.
I achieved a speedup of ~1.5 with only this little modification to the code.

At first, I thought the getter for Ray.Dir and Ray.Pos were doing some stuff behind the scene, that would slow the program down.

Here are the getters for both:

    public Vector3d Pos
    {
        get { return _pos; }
    }

    public Vector3d Dir
    {
        get { return _dir; }
    }

So, both return a Vector3D, and that's it.

I really wonder, how calling the getter would take that much longer, than accessing the variable directly.

Is it because of the CPU caching variables? Or maybe the overhead from calling these methods repeatedly added up? Or maybe the JIT handling the latter case better than the former? Or maybe there's something else I'm not seeing?

Any insights would be greatly appreciated.

Edit:

As @MatthewWatson suggested, I used a StopWatch to time release builds outside of the debugger. In order to get rid of noise, I ran the tests multiple times. As a result, the former code takes ~21 seconds (between 20.7 and 20.9) to finish, whereas the latter only ~19 seconds (between 19 and 19.2).
The difference has become negligible, but it is still there.

Community
  • 1
  • 1
Nolonar
  • 5,962
  • 3
  • 36
  • 55
  • 4
    Are you using a Stopwatch to time a release build which you run from outside any debugger? (If not, you should!) Release builds should inline simple getters, but debug builds won't. – Matthew Watson May 27 '13 at 20:12
  • @MatthewWatson I do use a StopWatch, but the 88 and 60 seconds come from the profiler `Analyze -> Start Performance Analysis Alt+F2` -> `Wall Clock Time (Seconds)` using a debug build. – Nolonar May 27 '13 at 20:15
  • 2
    You'll have to time a release build to get a proper result. Like I said, debug builds don't inline simple property getters and setters and release builds do. (I'm talking about the Jitter btw, not the IL code.) That's going to make a *huge* difference to this particular code, I think. – Matthew Watson May 27 '13 at 20:18
  • @MatthewWatson Thanks for the tip. I did as you suggested and got some more consistent results. With optimization, the code takes **~19** seconds to run, while without it takes **~21** seconds. There is still a difference, albeit a negligible one. – Nolonar May 27 '13 at 20:22
  • Are `Ray` and/or `Vector3d` classes? One reason could be null checking before the invocation of instance members. – Mike Zboray May 27 '13 at 20:43
  • 2
    Perhaps also try moving the `(xPos - xCen)` (and related) expressions into local variables as well. No sense doing the calculation 3 times over. EDIT: If they're structs, it can take longer to retrieve them via the property because it will copy the struct to an entirely new one each time you hit the property. (remember, a property is just a method) – Chris Sinclair May 27 '13 at 20:50
  • @mikez `Ray` is a class, while `Vector3D` is a struct. – Nolonar May 27 '13 at 20:53
  • @ChrisSinclair `Vector3D` is indeed a struct. Also, thanks for suggesting the `xPos - xCen` part. I must be lacking a lot of sleep, if I didn't notice something this obvious. – Nolonar May 27 '13 at 20:55
  • you are doing two optimisations here, not just one : yes you avoid the getter, as you said, but more importantly you avoid an indirection when retrieving the 'dir' from 'ray'. I would bet that the latest is more important. – GameAlchemist May 27 '13 at 21:10
  • Ten percent change in performance is *negligible*? Ten percent change in performance is *enormous* in any performance analysis I've ever done. – Eric Lippert May 28 '13 at 04:49

1 Answers1

7

Introduction

I'd be willing to bet that the original code is so much slower because of a quirk in C# involving properties of type structs. It's not exactly intuitive, but this type of property is inherently slow. Why? Because structs are not passed by reference. So in order to access ray.Dir.x, you have to

  1. Load local variable ray.
  2. Call get_Dir and store the result in a temporary variable. This involves copying the entire struct, even though only the field 'x' is ever used.
  3. Access field x from the temporary copy.

Looking at the original code, the get accessors are called 18 times. This is a huge waste, because it means that the entire struct is copied 18 times overall. In your optimized code, there are only two copies - Dir and Pos are both called only once; further access to the values only consist of the third step from above:

  1. Access field x from the temporary copy.

To sum it up, structs and properties do not go together.

Why does C# behave this way with struct properties?

It has something to do with the fact that in C#, structs are value types. You are passing around the value itself, rather than a pointer to the value.

Why doesn't the compiler recognize that the get accessor is simply returning a field, and bypass the property alltogether?

In debug mode, optimizations like this are skipped to provide for a better debegging experience. Even in release mode, you'll find that most jitters don't often do this. I don't know exactly why, but I believe it is because the field is not always word-aligned. Modern CPUs have odd performance requirements. :-)

leviathanbadger
  • 1,682
  • 15
  • 23