5

I am computing the bounding box of a 3d point cloud in Clojure. The point cloud is represented as a Java primitive float array, and every point in the point cloud is stored using 4 floats where the last float is unused. Like this:

[x0 y0 z0 u0  x1 y1 z1 u1 .... ]

The size of the point cloud is 19200, that is 4*19200 floats in the array.

Some of these values may not be finite (either they are infinite or NaN). So any point that contains a non-finite value should be entirely excluded from the computation. I have implemented this computation both in Java and in Clojure, but for some reason, the Clojure version is still about 4 to 5 times slower.

This is what the Clojure code looks like:

(defn good-compute-bbox [^floats data]
  (let [n (alength data)]
    (loop [i (int 0)
           minx (float (aget data 0))
           maxx (float (aget data 0))
           miny (float (aget data 1))
           maxy (float (aget data 1))
           minz (float (aget data 2))
           maxz (float (aget data 2))]
      (if (< i n)
        (let [x (float (aget data (unchecked-add i 0)))
              y (float (aget data (unchecked-add i 1)))
              z (float (aget data (unchecked-add i 2)))]
          (if (and (Float/isFinite x)
                   (Float/isFinite y)
                   (Float/isFinite z))
            (recur (unchecked-add-int i (int 4))
                   (min minx x)
                   (max maxx x)
                   (min miny y)
                   (max maxy y)
                   (min minz z)
                   (max maxz z))
            (recur (unchecked-add-int i (int 4))
                   minx
                   maxx
                   miny
                   maxy
                   minz
                   maxz
                   ))
          )
        [minx maxx miny maxy minz maxz]))))

and this is what the Java code looks like:

public class BBox {

    public static float[] computeBBox(float[] data) {
        long n = data.length;
        float[] result = new float[6];

        float minx = data[0];
        float maxx = data[0];
        float miny = data[1];
        float maxy = data[1];
        float minz = data[2];
        float maxz = data[2];
        for (int i = 0; i < n; i += 4) {
            float x = data[i + 0];
            float y = data[i + 1];
            float z = data[i + 2];
            if (java.lang.Float.isFinite(x) &&
                java.lang.Float.isFinite(y) &&
                java.lang.Float.isFinite(z)) {

                minx = x < minx? x : minx;
                maxx = x > maxx? x : maxx;

                miny = y < miny? y : miny;
                maxy = y > maxy? y : maxy;

                minz = z < minz? z : minz;
                maxz = z > maxz? z : maxz;
            }
        }

        result[0] = minx;
        result[1] = maxx;
        result[2] = miny;
        result[3] = maxy;
        result[4] = minz;
        result[5] = maxz;

        return result;
    }
};

My question is: What changes can I make to my Clojure code to make it run as fast as the Java code? Bonus point if you tested your code and measured the speed-up.

If you want to reproduce the experiment and published on my Github repository here.

Rulle
  • 4,496
  • 1
  • 15
  • 21

1 Answers1

4

The culprit here --as I can see from your code that you've already suspected-- is min. It works with all kinds of types and isn't all that fast.

You can get within 10% of the Java by using Apache's commons fast math:

(defn primitive-compute-bbox-new [^floats data]
  (let [n (alength data)]
    (loop [i (int 0)
           minx (aget data 0)
           maxx (aget data 0)
           miny (aget data 1)
           maxy (aget data 1)
           minz (aget data 2)
           maxz (aget data 2)]
      (if (< i n)
        (let [x (aget data (unchecked-add i 0))
              y (aget data (unchecked-add i 1))
              z (aget data (unchecked-add i 2))]
          (if (and (Float/isFinite x)
                   (Float/isFinite y)
                   (Float/isFinite z))
            (recur (unchecked-add-int i 4)
                   (FastMath/min (float minx) x)
                   (FastMath/max (float maxx) x)
                   (FastMath/min (float miny) y)
                   (FastMath/max (float maxy) y)
                   (FastMath/min (float minz) z)
                   (FastMath/max (float maxz) z))
            (recur (unchecked-add-int i 4)
                   minx
                   maxx
                   miny
                   maxy
                   minz
                   maxz)))
        [minx maxx miny maxy minz maxz]))))

Dep:

[org.apache.commons/commons-math3 "3.6.1"]
ClojureMostly
  • 4,652
  • 2
  • 22
  • 24
  • Thank you very much for taking your time to answer this question. I incorporated your answer with this commit: https://github.com/jonasseglare/pc-test/commit/e11223b840e367bcf3e40642742edeff7d0196d9 . However, I am not observering any significant performance improvement, using the benchmark function (run-benchmark) that I added. But it could of course be that my benchmarking is flawed in some way, I don't know. Thanks anyway. – Rulle Mar 24 '18 at 12:35
  • Did you run the benchmarking function several times? The first run is not representative. When I repeatedly run your benchmark code the above version is within 5-10% of the Java version. – ClojureMostly Mar 24 '18 at 19:37
  • I did run it several times, but it seems to stabilize at java-version "Elapsed time: 20.761929 msecs" good-compute-bbox "Elapsed time: 80.856727 msecs" primitive-compute-bbox-new "Elapsed time: 99.636311 msecs" But there could be many reasons why the difference is smaller for you... such as JDK version. – Rulle Apr 10 '18 at 18:51