19

I don't Java much.

I am writing some optimized math code and I was shocked by my profiler results. My code collects values, interleaves the data and then chooses the values based on that. Java runs slower than my C++ and MATLAB implementations.

I am using javac 1.7.0_05 I am using the Sun/Oracle JDK 1.7.05

There exists a floor function that performs a relevant task in the code. java math.floor profile results

  1. Does anybody know of the paradigmatic way to fix this?
  2. I noticed that my floor() function is defined with something called StrictMath. Is there something like -ffast-math for Java? I am expecting there must be a way to change the floor function to something more computationally reasonable without writing my own.

    public static double floor(double a) {
        return StrictMath.floor(a); // default impl. delegates to StrictMath
    }
    

Edit

So a few people suggested I try to do a cast. I tried this and there was absolutely no change in walltime.

private static int flur(float dF)
{
    return (int) dF;
}

413742 cast floor function

394675 Math.floor

These test were ran without the profiler. An effort was made to use a profiler but the runtime was drastically altered (15+ minutes so I quit).

Mikhail
  • 7,749
  • 11
  • 62
  • 136
  • 2
    if you have to use `StrictMath.*` (which is slower than `Math.*` in general for being more accurate) -- you may want to cache the floor results in map, if there is repeated calculation of same value for floor. – Nishant Aug 21 '12 at 06:23
  • 5
    Could you give some more context around your code? How many calls are there to `Math.floor()`? What kind of numbers do you work with? Oh and the fact that C++ or MATLAB is faster in mathematical operations is not really surprising. – posdef Aug 21 '12 at 06:25
  • 1
    @Nishant the OP states that `Math.floor()` delegates to `StrictMath.floor()`, see: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Math.java#Math.floor%28double%29 – posdef Aug 21 '12 at 06:27
  • @posdef hey thanks for pointing that out. I see the `StrictMath` uses native -- `public static native double floor(double a);` – Nishant Aug 21 '12 at 06:59
  • 1
    Even in C/C++, `floor()` is known to be obnoxiously slow. It's not that flooring is a slow operation. There are bitwise hacks to make it extremely fast. But there are many corner cases that need to be checked for and dealt with. The SSE4.1 instruction set adds native instructions for rounding operations including `floor()`. But accessing those requires C/C++ with compiler intrinsics or assembly. – Mysticial Aug 21 '12 at 07:20
  • @Misha could you possibly post the relevant piece of code (i.e. where the `Math.floor()` calls are made)? – posdef Aug 21 '12 at 07:36
  • I guess that floor() is not what takes up all the time. The profiler seems to misguide in this case. – atamanroman Aug 21 '12 at 09:27

6 Answers6

8

You might want to give a try to FastMath.

Here is a post about the performance of Math in Java vs. Javascript. There are a few good hints about why the default math lib is slow. They are discussing other operations than floor, but I guess their findings can be generalized. I found it interesting.

EDIT

According to this bug entry, floor has been implemented a pure java code in 7(b79), 6u21(b01) resulting in better performance. The code of floor in the JDK 6 is still a bit longer than the one in FastMath, but might not be responsible for such a perf. degradation. What JDK are you using? Could you try with a more recent version?

ewernli
  • 38,045
  • 5
  • 92
  • 123
6

Here's a sanity check for your hypothesis that the code is really spending 99% of its time in floor. Let's assume that you have Java and C++ versions of the algorithm that are both correct in terms of the outputs they produce. For the sake of the argument, let us assume that the two versions call the equivalent floor functions the same number of times. So a time function is

t(input) = nosFloorCalls(input) * floorTime + otherTime(input)

where floorTime is the time taken for a call to floor on the platform.

Now if your hypothesis is correct, and floorTime is vastly more expensive on Java (to the extent that it takes roughly 99% of the execution time) then you would expect the Java version of the application to run a large factor (50 times or more) slower than the C++ version. If you don't see this, then your hypothesis most likely is false.


If the hypothesis is false, here are two alternative explanations for the profiling results.

  1. This is a measurement anomaly; i.e. the profiler has somehow got it wrong. Try using a different profiler.

  2. There is a bug in the Java version of your code that is causing it to call floor many, many more times than in the C++ version of the code.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • checking the number of invocations on the `floor()` call would also be a good measure to see if there are more calls to the method than intended. – posdef Aug 21 '12 at 07:33
  • 1
    The problem was solved by using a cast, but the initial question dealt with analyzing the profiler result. Interestingly, the C++ version with all optomizations runs ~13x faster. Lastly, when using a cast the ArrayList.add function takes up 66% percent of the time as expected. – Mikhail Aug 22 '12 at 20:22
5

Math.floor() is insanely fast on my machine at around 7 nanoseconds per call in a tight loop. (Windows 7, Eclipse, Oracle JDK 7). I'd expect it to be very fast in pretty much all circumstances and would be extremely surprised if it turned out to be the bottleneck.

Some ideas:

  • I'd suggest re-running some benchmarks without a profiler running. It sometimes happens that profilers create spurious overhead when they instrument the binary - particularly for small functions like Math.floor() that are likely to be inlined.
  • Try a couple of different JVMs, you might have hit an obscure bug
  • Try the FastMath class in the excellent Apache Commons Math library, which includes a new implementation of floor. I'd be really surprised if it is faster, but you never know.
  • Check you are not running any virtualisation technonolgy or similar that might be interfering with Java's ability to call native code (which is used in a few of the java.lang.Math functions including Math.floor())
posdef
  • 6,498
  • 11
  • 46
  • 94
mikera
  • 105,238
  • 25
  • 256
  • 415
  • 1
    I think they improved the perf of `floor` since JDK 7(b79), 6u21(b01). See http://bugs.sun.com/view_bug.do?bug_id=6908131 – ewernli Aug 21 '12 at 06:55
4

First of all: Your profiler shows that your spending 99% of the cpu time in the floor function. This does not indicate floor is slow. If you do nothing but floor() thats totally sane. Since other languages seem to implement floor more efficient your assumption may be correct, however.

I know from school that a naive implementation of floor (which works only for positive numbers and is one off for negative ones) can be done by casting to an integer/long. That is language agnostic and some sort of general knowledge from CS courses.

Here are some micro benches. Works on my machine and backs what I learned in school ;)

rataman@RWW009 ~/Desktop
$ javac Cast.java && java Cast
10000000 Rounds of Casts took 16 ms

rataman@RWW009 ~/Desktop
$ javac Floor.java && java Floor
10000000 Rounds of Floor took 140 ms
#
public class Cast/Floor {

    private static final int ROUNDS = 10000000;

    public static void main(String[] args)
    {
        double[] vals = new double[ROUNDS];
        double[] res = new double[ROUNDS];

        // awesome testdata
        for(int i = 0; i < ROUNDS; i++)
        {
            vals[i] = Math.random() * 10.0;
        }

        // warmup
        for(int i = 0; i < ROUNDS; i++)
        {
            res[i] = floor(vals[i]);
        }

        long start = System.currentTimeMillis();
        for(int i = 0; i < ROUNDS; i++)
        {
            res[i] = floor(vals[i]);
        }
        System.out.println(ROUNDS + " Rounds of Casts took " + (System.currentTimeMillis() - start) +" ms");
    }

    private static double floor(double arg)
    {
        // Floor.java
        return Math.floor(arg);
        // or Cast.java
        return (int)arg;
    }

}

atamanroman
  • 11,607
  • 7
  • 57
  • 81
  • Well I cannot say if it's twice as fast or 11 times faster. He will have to profile it himself. Take care you don't run out of question marks! :) – atamanroman Aug 21 '12 at 06:34
  • Ha ha :)... I will keep that in mind. But, you are answering based on uncertainity and the answer become a suggestion without it. – Starx Aug 21 '12 at 06:41
  • 2
    You really should back up your answers with verifiable research, or it will likely be down voted or possibly removed. – Tim Post Aug 21 '12 at 06:48
  • Edited to make the answer more complete. – atamanroman Aug 21 '12 at 09:24
4

It is worth noting that monitoring a method takes some overhead and in the case of VisualVM, this is fairly high. If you have a method which is called often but does very little it can appear to use lots of CPU. e.g. I have seen Integer.hashCode() as a big hitter once. ;)

On my machine a floor takes less 5.6 ns, but a cast takes 2.3 ns. You might like to try this on your machine.


Unless you need to handle corner cases, a plain cast is faster.

// Rounds to zero, instead of Negative infinity.
public static double floor(double a) {
    return (long) a;
}

public static void main(String... args) {
    int size = 100000;
    double[] a = new double[size];
    double[] b = new double[size];
    double[] c = new double[size];
    for (int i = 0; i < a.length; i++) a[i] = Math.random()  * 1e6;

    for (int i = 0; i < 5; i++) {
        timeCast(a, b);
        timeFloor(a, c);
        for (int j = 0; j < size; j++)
            if (b[i] != c[i])
                System.err.println(a[i] + ": " + b[i] + " " + c[i]);
    }
}

public static double floor(double a) {
    return a < 0 ? -(long) -a : (long) a;
}

private static void timeCast(double[] from, double[] to) {
    long start = System.nanoTime();
    for (int i = 0; i < from.length; i++)
        to[i] = floor(from[i]);
    long time = System.nanoTime() - start;
    System.out.printf("Cast took an average of %.1f ns%n", (double) time / from.length);
}

private static void timeFloor(double[] from, double[] to) {
    long start = System.nanoTime();
    for (int i = 0; i < from.length; i++)
        to[i] = Math.floor(from[i]);
    long time = System.nanoTime() - start;
    System.out.printf("Math.floor took an average of %.1f ns%n", (double) time / from.length);
}

prints

Cast took an average of 62.1 ns
Math.floor took an average of 123.6 ns
Cast took an average of 61.9 ns
Math.floor took an average of 6.3 ns
Cast took an average of 47.2 ns
Math.floor took an average of 6.5 ns
Cast took an average of 2.3 ns
Math.floor took an average of 5.6 ns
Cast took an average of 2.3 ns
Math.floor took an average of 5.6 ns
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Thanks for the suggestion. I tried doing something like this and it doesn't help. – Mikhail Aug 21 '12 at 07:30
  • @misha As I mention, visual vm can throw up some very short method called often as being far more important than they really are. Even in commercial profilers you need to be careful about the results you get. Without the profiler, try calling Math.floor twice to see how much slower it is, this will give you an idea as to how much time it is adding. – Peter Lawrey Aug 21 '12 at 07:39
  • This is interesting profiling information. I will take another look at my instrumentation. My runs were done without a profiler. – Mikhail Aug 21 '12 at 07:40
  • 1
    @misha You can't remove the floor without changing the behaviour but you can call it two or three times to see how much slower it is ;) – Peter Lawrey Aug 21 '12 at 07:44
0

Math.floor (and Math.ceil) can be a surprising bottleneck if your algorithm depends on it a lot. This is because these functions handle edge cases that you might not care about (such as minus-zero and positive-zero etc). Just look at the implementation of these functions to see what they're actually doing; there's a surprising amount of branching in there.

Also consider that Math.floor/ceil take only a double as an argument and return a double, which you might not want. If you just want an int or long, some of the checks in Math.floor are simply unnecessary.

Some have suggested to simply cast to an int, which will work as long as your values are positive (and your algorithm doesn't depend on the edge cases that Math.floor checks for). If that's the case, a simple cast is the fastest solution by quite a margin (in my experience).

If for example your values can be negative and you want an int from a float, you can do something like this:

public static final int floor(final float value) {
    return ((int) value) - (Float.floatToRawIntBits(value) >>> 31);
}

(It just subtracts the float's sign bit from the cast to make it correct for negative numbers, while preventing an "if")

In my experience, this is a lot faster than Math.floor. If it isn't, I suggest to check your algorithm, or perhaps you've ran into JVM performance bug (which is much less likely).

erikd
  • 11
  • 3