0

Im making a 3D OpenGL LWJGL game and i've replaced a class for 3D float vectors with its generic version, and implemented "clone()" method from "Cloneable". After that, performance drops significally (GC usage went from below 1% to 10%). Here's a code example for vector edition before and after the change:

Before:

public class Vec3f {
    public float x, y, z;

    ...

    public Vec3f add(Vec3f v) {
        return new Vec3f(x + v.x, y + v.y, z + v.z);
    }

    public Vec3f addThis(Vec3f v) {
        x += v.x;
        y += v.y;
        z += v.z;
    }
}

After:

public abstract class Vec<V extends Vec<V>> implements Cloneable {
    private Class<V> klass;
    protected float[] coords;

    protected Vec(int dim, Class<V> klass) {
        this(dim, new float[dim], klass);
    }

    public V clone() {
        try {
            V c = klass.newInstance();
            c.coords = this.coords.clone();
            return c;
        }
        catch(InstantiationException e1) {}
        catch(IllegalAccessException e2) {}
        return null;
    }

     ...

    public V add(V that) {
        V sum = this.clone();
        sum.addThis(that);
        return sum;
    }

    public void addThis(V that) {
        for (int i = 0; i < coords.length; i++) {
            coords[i] += that.coords[i];
        }
    }
}


public class Vec3 extends Vec<Vec3> {

    public Vec3() {
        super(3, Vec3.class);
    }
}

But it makes no sense at all, as the code actually does the exact same thing.

user2340939
  • 1,791
  • 2
  • 17
  • 44
  • Creating and looping through an array is not "the same thing" as reading a few variables. – shmosel Jul 01 '14 at 17:24
  • Is it such a big difference in creating 3 variables (before) and creating an array with 3 items (after)? – user2340939 Jul 01 '14 at 18:12
  • 1
    It's apples and oranges. Before, you had an object with 3 variables. Now, you're creating a new array on *every* call to `add`. – shmosel Jul 01 '14 at 18:27
  • Any suggestions on how to preserve the generics funcionality and get rid of array creation? – user2340939 Jul 01 '14 at 18:42
  • OK, I just noticed that you're also creating a new instance of `Vec` on every invocation. That's probably even more expensive than the `clone`. Does `Vec` need to be immutable? In any case, I don't see what generics has to do with array creation. The same logic you're using to copy the array can be used to copy individual variables. – shmosel Jul 01 '14 at 18:53
  • I've mistaken reflection == generics, but got it resolved now. Anyway, I am using arrays instead of variables, because the class, which extends generic class Vec, in our example Vec3 (could be aslo Vec2, Vec4 etc.), is using 3 variables; so in other words, the class which extends the generic class Vec, tells how many variables are going to be in a (extended) class. And since I want a common method to deal with all the extended classes ("vector dimensions" or number of data members), array seems the best soluction - with variables it eems impossible. – user2340939 Jul 01 '14 at 19:05
  • I see. But I still don't understand why all the cloning is necessary. Why can't you just loop over the array of the item you want to add? – shmosel Jul 01 '14 at 19:11
  • As you can see, I have two version of addition methods: add and addThis. First one adds another vector to this vector, the second one, however, does not alter "this" vector, it creates a new vector (hence the clone() is needed) which represents an addition of this and another vector. – user2340939 Jul 01 '14 at 19:16
  • I see what you're trying to do, but it has very little in common with the original version. – shmosel Jul 01 '14 at 19:44
  • Well the point was to use generics so I could make vectors of arbitrary dimensions, whereas in the first example I have only a 3D vector. – user2340939 Jul 01 '14 at 19:55
  • If speed is the goal, having a class that can handle nD will be slower than 3d. Even just having a loop is slower than having 3 additions, this is because a loop has condition checks and jumps in it. Unless ofcourse hotspot can unfold the loop for you, which would only be possible if hotspot could prove the length of the for loop at compile time. – Chris K Jul 01 '14 at 20:27

1 Answers1

0

GC performance is related to how many live objects there are on the heap. The second version of your code creates more objects, which will create more work for the GC.

The second version of the code is likely to also run slower, it uses reflection which has some overhead and it is likely to also suffer from more CPU cache misses as it involves more pointer chasing.

That is, having the x,y,z as fields will be faster than having an array referenced from the Vec3F class.

Chris K
  • 11,622
  • 1
  • 36
  • 49
  • But isn't reflection done in compile time and therefore does not effect performance? – user2340939 Jul 01 '14 at 17:25
  • Not in Java, no. You may be thinking of Generics. Reflection overheads have been getting lower and lower, and there are some Hotspot optimisations that can remove them entirely. But that is not common, yet. – Chris K Jul 01 '14 at 17:26
  • @user2340939 Reflection is *by definition* a runtime operation. From Wikipedia: "In computer science, reflection is the ability of a computer program to examine (see type introspection) and modify the structure and behavior (specifically the values, meta-data, properties and functions) of the program at runtime". I agree with Chris though that you are probably mixing up reflection with something else. – awksp Jul 01 '14 at 17:42
  • Yes I've mistaken reflection == generics, but got it resolved now. So what would be the best way to tackle this, as in making performance MUCH better? – user2340939 Jul 01 '14 at 18:10
  • 1
    You are stretching the scope of the original question, and to be honest I don't believe that there is enough information here to do it justice. However here is some guidance to get you started. For a traditional Java solution, your first approach was good. It followed good OO principles, was simple and immutable. However if speed is now your goal, you will need to start selectively removing some of that good design and get closer to the hardware. Immutable classes are good, but can lead to phantom objects. When working on lots of numbers, GPUs always work on raw arrays. – Chris K Jul 01 '14 at 18:49