-2

I'm working on an optimized java library and I'm wondering if having things like

    int rX = rhsOffset;
    int rY = rhsOffset + 1;
    int rZ = rhsOffset + 2;
    int rW = rhsOffset + 3;

where local variable rX is a redundant yet makes code further down the line more readable. Does rX in this case just get compiled out at the Java byte code or JIT execution time?

Also I've seen librarys

 m[offset + 0] = f / aspect;
 m[offset + 1] = 0.0f;
 m[offset + 2] = 0.0f;
 m[offset + 3] = 0.0f;

where " + 0 " is done to improve the look of the code.

I'm wanting to do the same but would like to make sure I'm not hurting performance. I don't know of any good way to determine if memory is allocated or math is processed for ether of these cases. In Android Studio you can use the memory profiler that allows you to capture all allocations and examine them but IntelliJ doesn't appear to offer that functionality and I'm assuming I can't rely on any optimizations androids build system does to be done to a normal(non-android) Java project.

Hex Crown
  • 753
  • 9
  • 22
  • 7
    That's micro-optimization. Even if it did affect performance, you're most likely writing other code that's much more ineffective. Concentrate on things that matter, if you think it makes your code more readable, do it. You can always optimize it away if it shows up in profiling results (it won't). – Kayaman Mar 19 '18 at 11:02
  • 3
    Also keep in mind that the JIT is a clever thing. It will optimize away simple things like reusable variables to make sure your code runs smoothly. Trying to be more clever than the compiler is not going to get you far in most cases. – Ben Mar 19 '18 at 11:09
  • 2
    "I'm working on an optimized java library" then I hope that you're using a profiler to determine where to focus your optimization efforts. If so, you'd know if there is a performance impact. – Andy Turner Mar 19 '18 at 11:11
  • Why not just rename `rhsOffset` to `rX`? And since when does W come after Z? Don't you think you might be kidding yourself about readibility here? It just confuses the hell out of me. – user207421 Mar 19 '18 at 11:12
  • @EJP well W is the homogenous vertex coordinate of a three dimensional vertex so yes... it comes after Z. – Ben Mar 19 '18 at 11:18
  • @Ben If any of that information appeared in the question, or as a comment in the code, you might have a point. As it doesn't, you don't. – user207421 Mar 19 '18 at 11:20
  • @AndyTurner If there is no impact on my profiler would that be the case regardless of the JVM being used? For Instance if im using java9 jvm but someone uses an older jvm, would what I see on mine result in the same performance on theirs? – Hex Crown Mar 19 '18 at 11:24
  • @Kayaman the code will be hot(used extremely often) so ideally I'd like to have it reasonably well optimized from the get-go – Hex Crown Mar 19 '18 at 11:25
  • @EJP the reason behind not renaming it is because rhsOffset is the offset index the vector exists at in the array, but internally its used as the x index, so from an external view rhsOffset makes more sense and from an internal view rX makes more sense. Also as Ben said, this is for vectors/ vertices. (granted that wasn't mentioned in the question as I thought it wasn't relevant.) – Hex Crown Mar 19 '18 at 11:28
  • 1
    @HexCrown "If there is no impact on my profiler would that be the case regardless of the JVM being used?" If you don't know the answer to this question, what do you mean by "optimized"? You can only know the impact by measuring on all the JVMs on which you want to consider the library to be optimized. – Andy Turner Mar 19 '18 at 11:33
  • @AndyTurner well I was hoping someone would know if the above example would have a standard behavior (such as being compiled out, etc). I don't think its an unreasonable question as I'm sure someone out there knows off the top of there head if its the case or not. I was hoping to find out without having to install a bunch of different JVMs etc. If you don't know the answer, thats fine, no need to berate someone, just move on. – Hex Crown Mar 19 '18 at 11:47
  • 3
    Something common with these "I'm writing an optimized X" questions is that the author usually has only a very vague idea on how to write optimized anything. – Kayaman Mar 19 '18 at 12:17
  • 2
    @HexCrown I wasn't looking to berate you, I am pointing out that the only way you can know if something is "optimized" - your stated goal - is to measure. Optimization in the absence of measurement isn't optimization, it's *guessing*. There isn't a standard way of handling +0; it's up to compiler and JVM implementors to decide what to do, within the bounds of the specs. If you are unable to measure on a range of JVMs, do it on a common one; there is a *reasonable* chance it's the same on others, but just don't rely on it. – Andy Turner Mar 19 '18 at 12:17
  • @Kayaman I'm not new to optimization, my code base consists of reusing allocated memory, contiguous memory allocation(using packed primitive arrays) to take advantage of mem cache line reading, I'm also working with branch prediction as well as making sure to watch for stalls, I haven't begun adding thread pooling yet for parallel task execution, but I'm also going to be implementing that. So while most people might not know how to write optimized code, I don't consider my self to be in that group, I'm simply wanting to eliminate some of the research time by finding answers online. – Hex Crown Mar 19 '18 at 12:30
  • 1
    Then you should include that information in the questions. Most performance related questions here are based on entirely false premises. How are you profiling your library? – Kayaman Mar 19 '18 at 12:34
  • @Kayaman part of my library was profiled in Android Studio, using there built in profiler, I'm currently porting that stuff and haven't profiled it out side of that yet. That said the new stuff I've been using code based tests to check time stamp etc, running the tests multiple passes to allow for caching etc then bench marking against more "normal" approaches for smaller tests like applying formulas per element to large arrays etc. I'm still looking for a tool that gives me as detailed info as what Android Studio could provide though(gc, individual allocation tracking, stack walking, etc) – Hex Crown Mar 19 '18 at 12:39
  • @Kayaman I wouldn't go into such obsessive optimization so early but I've actually spent years on android development specifically realtime graphics with gles, and things like the "stutter" caused by the stop-the-world event when the GC runs plagued my original code base and was far to much work to optimize out/fix later on (but wasn't noticeable early on) so I'm trying to squish as many of those issues up front for "hot" code paths to try and not run into the same situation as in the past. – Hex Crown Mar 19 '18 at 12:43
  • 2
    But you're working in Java world now, not in Android world. Don't make the mistake of thinking they're similar. – Kayaman Mar 19 '18 at 13:03
  • In particular, the HotSpot JIT compilers do a lot better job of optimizing than (particularly) old Android compilers. – Stephen C Mar 19 '18 at 13:09
  • @Kayaman Yeah, I get that, and thats why I've been looking for a good profiler(found a couple that look possibly promising, will have to do more testing). – Hex Crown Mar 19 '18 at 13:16
  • @StephenC The Android compilers I've used recently have all been quite modern, but yes, I still have to try and get some good profiling done on the code under the jvm. – Hex Crown Mar 19 '18 at 13:18

1 Answers1

1

I wrote some code in order to investigate this experimentally, see my repository on Github.

Summary: I ran some experiments on my 64-bit Ubuntu computer with Oracle JDK 9. From what I can tell, with these particular experiments, (i) redundant variables don't seem to affect runtime in practice and (ii) whether you add a redundant 0 or not does not seem to matter. My advice is to not worry about the kind of performance concerns that you mention, the just-in-time compiler is probably smart enough for these sorts of things and lousy performance might never be a problem in the first place.

For the first question, I ran my experiment with both the Oracle JDK 9 javac compiler and the embeddable Janino compiler. I get similar results, suggesting that probably most optimizations are carried out by the JIT.

I would advice you to do your own experiments on your JVM with toy examples that you believe are representative for what you are doing. Or measure directly in your actual code in case lousy performance would turn out to be a problem.

Below follows details about my experiments.

Question 1: Does introducing redundant variables affect execution time?

I introduced a parameter, let's call it n, that controls the degree of redundant assignments, and wrote a code generator that will generate code for a nonsense computation and introduce redundant assignments based on the value of n. For instance, for n=0 it produces this code:

public static double eval0(double[] X, double[] Y) {
  double sum = 0.0;
  assert(X.length == Y.length);
  int iters = X.length/3;
  for (int i = 0; i < iters; i++) {
    int at = 3*i;
    double x0 = X[at + 0];
    double x1 = X[at + 1];
    double x2 = X[at + 2];
    double y0 = Y[at + 0];
    double y1 = Y[at + 1];
    double y2 = Y[at + 2];
          double x1y2 = x1*y2;
          double x2y1 = x2*y1;
          double a = x1y2-x2y1;
          double x2y0 = x2*y0;
          double x0y2 = x0*y2;
          double b = x2y0-x0y2;
          double x0y1 = x0*y1;
          double x1y0 = x1*y0;
          double c = x0y1-x1y0;
    sum += a + b + c;
  }
return sum;

}

and for, say n=3 it produces this code:

public static double eval3(double[] X, double[] Y) {
  double sum = 0.0;
  assert(X.length == Y.length);
  int iters = X.length/3;
  for (int i = 0; i < iters; i++) {
    int at = 3*i;
    double x0 = X[at + 0];
    double x1 = X[at + 1];
    double x2 = X[at + 2];
    double y0 = Y[at + 0];
    double y1 = Y[at + 1];
    double y2 = Y[at + 2];
          double x1y2_28 = x1*y2;
          double x1y2_29 = x1y2_28;
          double x1y2_30 = x1y2_29;
          double x1y2 = x1y2_30;
          double x2y1_31 = x2*y1;
          double x2y1_32 = x2y1_31;
          double x2y1_33 = x2y1_32;
          double x2y1 = x2y1_33;
          double a_34 = x1y2-x2y1;
          double a_35 = a_34;
          double a_36 = a_35;
          double a = a_36;
          double x2y0_37 = x2*y0;
          double x2y0_38 = x2y0_37;
          double x2y0_39 = x2y0_38;
          double x2y0 = x2y0_39;
          double x0y2_40 = x0*y2;
          double x0y2_41 = x0y2_40;
          double x0y2_42 = x0y2_41;
          double x0y2 = x0y2_42;
          double b_43 = x2y0-x0y2;
          double b_44 = b_43;
          double b_45 = b_44;
          double b = b_45;
          double x0y1_46 = x0*y1;
          double x0y1_47 = x0y1_46;
          double x0y1_48 = x0y1_47;
          double x0y1 = x0y1_48;
          double x1y0_49 = x1*y0;
          double x1y0_50 = x1y0_49;
          double x1y0_51 = x1y0_50;
          double x1y0 = x1y0_51;
          double c_52 = x0y1-x1y0;
          double c_53 = c_52;
          double c_54 = c_53;
          double c = c_54;
    sum += a + b + c;
  }
return sum;

}

Both these functions perform exactly the same computation, but one has more redundant assignments. Finally, I also generate a dispatch function:

public double eval(int n, double[] X, double[] Y) {
  switch (n) {
    case 0: return eval0(X, Y);
    case 1: return eval1(X, Y);
    case 2: return eval2(X, Y);
    case 3: return eval3(X, Y);
    case 4: return eval4(X, Y);
    case 5: return eval5(X, Y);
    case 8: return eval8(X, Y);
    case 11: return eval11(X, Y);
    case 15: return eval15(X, Y);
    case 21: return eval21(X, Y);
    case 29: return eval29(X, Y);
    case 40: return eval40(X, Y);
    case 57: return eval57(X, Y);
    case 79: return eval79(X, Y);
    case 111: return eval111(X, Y);
    case 156: return eval156(X, Y);
    case 218: return eval218(X, Y);
    case 305: return eval305(X, Y);
  }
  assert(false);
  return -1;
}

All the generated code is on my repo here.

Then I benchmark all these functions for different values of n on X and Y arrays of size 10000 filled with random data. I did this using both the Oracle JDK 9 javac compiler and the embeddable Janino compiler. My benchmarking code also lets the JIT warm up a bit. Running the benchmark produces this output:

------ USING JAVAC
n = 0
"Elapsed time: 0.067189 msecs"
   Result= -9.434172113697462
n = 1
"Elapsed time: 0.05514 msecs"
   Result= -9.434172113697462
n = 2
"Elapsed time: 0.04627 msecs"
   Result= -9.434172113697462
n = 3
"Elapsed time: 0.041316 msecs"
   Result= -9.434172113697462
n = 4
"Elapsed time: 0.038673 msecs"
   Result= -9.434172113697462
n = 5
"Elapsed time: 0.036372 msecs"
   Result= -9.434172113697462
n = 8
"Elapsed time: 0.203788 msecs"
   Result= -9.434172113697462
n = 11
"Elapsed time: 0.031491 msecs"
   Result= -9.434172113697462
n = 15
"Elapsed time: 0.032673 msecs"
   Result= -9.434172113697462
n = 21
"Elapsed time: 0.030722 msecs"
   Result= -9.434172113697462
n = 29
"Elapsed time: 0.039271 msecs"
   Result= -9.434172113697462
n = 40
"Elapsed time: 0.030785 msecs"
   Result= -9.434172113697462
n = 57
"Elapsed time: 0.032382 msecs"
   Result= -9.434172113697462
n = 79
"Elapsed time: 0.033021 msecs"
   Result= -9.434172113697462
n = 111
"Elapsed time: 0.029978 msecs"
   Result= -9.434172113697462
n = 156
"Elapsed time: 18.003687 msecs"
   Result= -9.434172113697462
n = 218
"Elapsed time: 24.163828 msecs"
   Result= -9.434172113697462
n = 305
"Elapsed time: 33.479853 msecs"
   Result= -9.434172113697462
------ USING JANINO
n = 0
"Elapsed time: 0.032084 msecs"
   Result= -9.434172113697462
n = 1
"Elapsed time: 0.032022 msecs"
   Result= -9.434172113697462
n = 2
"Elapsed time: 0.029989 msecs"
   Result= -9.434172113697462
n = 3
"Elapsed time: 0.034251 msecs"
   Result= -9.434172113697462
n = 4
"Elapsed time: 0.030606 msecs"
   Result= -9.434172113697462
n = 5
"Elapsed time: 0.030186 msecs"
   Result= -9.434172113697462
n = 8
"Elapsed time: 0.032132 msecs"
   Result= -9.434172113697462
n = 11
"Elapsed time: 0.030109 msecs"
   Result= -9.434172113697462
n = 15
"Elapsed time: 0.031009 msecs"
   Result= -9.434172113697462
n = 21
"Elapsed time: 0.032625 msecs"
   Result= -9.434172113697462
n = 29
"Elapsed time: 0.031489 msecs"
   Result= -9.434172113697462
n = 40
"Elapsed time: 0.030665 msecs"
   Result= -9.434172113697462
n = 57
"Elapsed time: 0.03146 msecs"
   Result= -9.434172113697462
n = 79
"Elapsed time: 0.031599 msecs"
   Result= -9.434172113697462
n = 111
"Elapsed time: 0.029998 msecs"
   Result= -9.434172113697462
n = 156
"Elapsed time: 17.579771 msecs"
   Result= -9.434172113697462
n = 218
"Elapsed time: 24.561065 msecs"
   Result= -9.434172113697462
n = 305
"Elapsed time: 33.357928 msecs"
   Result= -9.434172113697462

From the above output, it appears that javac and Janino both produce about equally performant code, and that for low values of n, the value doesn't seem to matter. However, at n=156, we observe a dramatic increase in the runtime. I don't know why that is, but I suspect that it has to do with the number of local variables being limited on the JVM and thus the Java compiler (javac/Janino) has to use work-arounds to overcome that limitation. And those workarounds are harder for the JIT to optimize (this is what I suspect but maybe someone can shed some light on that...).

Question 2: Does redundantly adding 0 affect performance?

I wrote class to experiment with that. The class has two static methods that both do exactly the same computation, except that for apply0, we also add 0 when we compute the array indices:

public class Mul2d {
    public static double[] apply0(double angle, double[] X) {
        int n = X.length/2;
        double[] Y = new double[2*n];
        double cosv = Math.cos(angle);
        double sinv = Math.sin(angle);
        for (int i = 0; i < n; i++) {
            int at = 2*i;
            Y[at + 0] = cosv*X[at + 0] - sinv*X[at + 1];
            Y[at + 1] = sinv*X[at + 0] + cosv*X[at + 1];
        }
        return Y;
    }

    public static double[] apply(double angle, double[] X) {
        int n = X.length/2;
        double[] Y = new double[2*n];
        double cosv = Math.cos(angle);
        double sinv = Math.sin(angle);
        for (int i = 0; i < n; i++) {
            int at = 2*i;
            Y[at] = cosv*X[at] - sinv*X[at + 1];
            Y[at + 1] = sinv*X[at] + cosv*X[at + 1];
        }
        return Y;
    }
}

running a benchmark on a large array suggests that whether you add 0 or not does not matter. Here is the output of the benchmark:

With adding '+ 0'
"Elapsed time: 0.247315 msecs"
"Elapsed time: 0.235471 msecs"
"Elapsed time: 0.240675 msecs"
"Elapsed time: 0.251799 msecs"
"Elapsed time: 0.267139 msecs"
"Elapsed time: 0.250735 msecs"
"Elapsed time: 0.251697 msecs"
"Elapsed time: 0.238652 msecs"
"Elapsed time: 0.24872 msecs"
"Elapsed time: 1.274368 msecs"
Without adding '+ 0'
"Elapsed time: 0.239371 msecs"
"Elapsed time: 0.233459 msecs"
"Elapsed time: 0.228619 msecs"
"Elapsed time: 0.389649 msecs"
"Elapsed time: 0.238742 msecs"
"Elapsed time: 0.23459 msecs"
"Elapsed time: 0.23452 msecs"
"Elapsed time: 0.241013 msecs"
"Elapsed time: 0.356035 msecs"
"Elapsed time: 0.260892 msecs"

Runtimes appear pretty much equivalent, any differences seem to drown in the noise.

Conclusion: Regarding Question 1, I cannot observe any negative impact on the performance for this particular toy problem.

Regarding Question 2, whether you add +0 doesn't seem to matter. Unless the JIT optimizes away the +0, most likely the other computations in the loop dominate the total time, meaning that any extra small cost of adding +0 will drown in the noise.

Rulle
  • 4,496
  • 1
  • 15
  • 21
  • I appreciate all the time and effort put into this testing and also find it quite interesting in regards to the high level of redundant variables (150+) causing a large performance hit. In my use case the number of variables would be far fewer than that so it looks like I've got nothing to worry about. Thanks again. – Hex Crown Apr 20 '18 at 00:11
  • You're welcome! But you may want to run your own experiments, maybe my setup is not representative for your setup, in case you have a different JVM or something. – Rulle Apr 20 '18 at 04:59