0

Background: I am planning to port a library I have written from C++ to Java. The code deals with lists of size n of d-dimensional points and needs to compute scalar products, etc. I wanted to make my code independent of the storage format of points and introduced to this purpose an interface,

public interface PointSetAccessor
{
  float coord(int p, int c);
}

that allows me to get the c-th coordinate (0 ≤ c < d) of the p-th point (0 ≤ p < n).

Problem: Since the code has to be really fast, I was wondering whether this would slow down the performance, in contrast to a straight access pattern like points[p][c], where points is an array of n arrays, each of which holding the d point coordinates.

Surprisingly, the opposite was the case: the code (see below) is 20% faster with the "indirect" access through a PointSetAccessor. (I measured this using time java -server -XX:+AggressiveOpts -cp bin Speedo and got around 14s for the former and 11s for the latter version.)

Question: Any idea why this is so? Seems like Hotspot decides to optimise more aggressively or has more freedom to do so in the latter version?

Code (which computes non-sense):

public class Speedo
{
  public interface PointSetAccessor
  {
    float coord(int p, int c);
  }

  public static final class ArrayPointSetAccessor implements PointSetAccessor
  {
    private final float[][] array;

    public ArrayPointSetAccessor(float[][] array)
    {
      this.array = array;
    }

    public float coord(int point, int dim)
    {
      return array[point][dim];
    }
  }

  public static void main(String[] args)
  {
    final int n = 50000;
    final int d = 10;

    // Generate n points in dimension d
    final java.util.Random r = new java.util.Random(314);
    final float[][] a = new float[n][d];
    for (int i = 0; i < n; ++i)
      for (int j = 0; j < d; ++j)
        a[i][j] = r.nextFloat();

    float result = 0.0f;
    if (true)
    {
      // Direct version
      for (int i = 0; i < n; i++)
        for (int j = i + 1; j < n; ++j)
        {
          float prod = 0.0f;
          for (int k = 0; k < d; ++k)
            prod += a[i][k] * a[j][k];
          result += prod;
        }
    }
    else
    {
      // Accessor-based version
      final PointSetAccessor ac = new ArrayPointSetAccessor(a);
      for (int i = 0; i < n; i++)
        for (int j = i + 1; j < n; ++j)
        {
          result += product(ac, d, i, j);
        }
    }
    System.out.println("result = " + result);
  }

  private final static float product(PointSetAccessor ac, int d, int i, int j)
  {
    float prod = 0.0f;
    for (int k = 0; k < d; ++k)
      prod += ac.coord(i, k) * ac.coord(j, k);
    return prod;
  }
}
Hbf
  • 3,074
  • 3
  • 23
  • 32
  • 2
    Most likely your microbenchmark is simply flawed. Switch the order of testing and direct/indirect and see how much it changes the results. To avoid warmup issues execute the entire test suite at least once before measuring. – Durandal Apr 17 '13 at 16:43
  • True! After extending the program to run the test twice (in the same VM) and measuring CPU time of the second invocation, I do indeed see that _sometimes_ the two variants are equal fast. Sometimes, however, not – in these cases, the `PointSetAccessor` was faster (in my experiments). I have not yet found out why sometimes the "direct" method is slower. Thanks, @Durandal. – Hbf Apr 17 '13 at 22:51
  • Having learnt a bit more about JVM benchmarking in the meantime, one should use something like [JMH](http://java-performance.info/jmh/) for this. – Hbf Oct 24 '15 at 08:56

2 Answers2

5

Such short methods, if they are hot (called more than 10,000 times with default settings), will be inlined by hotspot, so you should not notice a difference of performance (the way you measure performance ignores many effects, such as warm up time for example, which can lead to erroneous results).

When running your code and asking hotspot to show what is inlined (-server -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining), you get the output below, which shows that both coord and product get inlined:

 76    1 %           javaapplication27.Speedo::main @ -2 (163 bytes)   made not entrant
 77    6             javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)
 78    7             javaapplication27.Speedo::product (45 bytes)
                        @ 18   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
                        @ 27   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
 80    2 %           javaapplication27.Speedo::main @ 101 (163 bytes)
                        @ 118   javaapplication27.Speedo::product (45 bytes)   inline (hot)
                          @ 18   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
                          @ 27   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
assylias
  • 321,522
  • 82
  • 660
  • 783
  • Actually, inlining can happen before compilation see e.g. http://www.oraclejavamagazine-digital.com/javamagazine/20120506/?pg=49#pg49 – kittylyst May 06 '13 at 11:27
2

If you are really worried about performance, you should investigate what getting rid of the two dimensional array (replace it with a one-dimensional array) would buy you.

Multi-dimensional arrays in java are more costly than in most other languages because java implements them as array-of-arrays (that is of N dimensions, any dimension less than N is an array of references to the next dimension).

For your float[50000][10], this means there is one array of 50000 references to a float[10]. Since each array is also an object (with a few bytes of header). Since the last dimension is pretty small (10) the overhead is significant in terms of memory usage (the reverse case float[10][50000] has a considerably smaller memory footprint).

Try a memory layout like this:

public static final class ArrayPointSetAccessor implements PointSetAccessor {
    private final int dimSize;
    private final float[] array;

    public ArrayPointSetAccessor(float[] array, int dimSize) {
        this.dimSize = dimSize;
        this.array = array;
    }

    public float coord(int point, int dim) {
        return array[dim * dimSize + point];
    }
}

I expect the accessor to cost a little performance in a non-trivial scenario (e.g. when the interface has more than one implementation). But go with the accessor interface anyway - flexibility and maintainability are usually more than worth a few percent of performance.

Durandal
  • 19,919
  • 4
  • 36
  • 70
  • Thanks for the comment, I verified this: it is indeed slightly faster with the one-dimensional array. – And yes, the memory footprint is better. – Hbf Apr 17 '13 at 22:48