Clojure Performance For Expensive Algorithms

Question

I have implemented an algorithm to calculate the longest contiguous common subsequence (not to be confused with longest common subsequence, though not important for this questions). I need to squeeze maximum performance from this because I'll be calling it a lot. I have implemented the same algorithm in Clojure and Java in order to compare performance. The Java version runs significantly faster. My question is whether there is anything I can do to the Clojure version to speed it up to the level of Java.

Here's the Java code:

public static int lcs(String[] a1, String[] a2) {
    if (a1 == null || a2 == null) {
        return 0;
    }

    int matchLen = 0;
    int maxLen = 0;

    int a1Len = a1.length;
    int a2Len = a2.length;
    int[] prev = new int[a2Len + 1]; // holds data from previous iteration of inner for loop
    int[] curr = new int[a2Len + 1]; // used for the 'current' iteration of inner for loop

    for (int i = 0; i < a1Len; ++i) {
        for (int j = 0; j < a2Len; ++j) {
            if (a1[i].equals(a2[j])) {
                matchLen = prev[j] + 1; // curr and prev are padded by 1 to allow for this assignment when j=0
            }
            else {
                matchLen = 0;
            }
            curr[j+1] = matchLen;

            if (matchLen > maxLen) {
                maxLen = matchLen;
            }
        }

        int[] swap = prev;
        prev = curr;
        curr = swap;
    }

    return maxLen;
}

Here is the Clojure version of the same:

(defn lcs
  [#^"[Ljava.lang.String;" a1 #^"[Ljava.lang.String;" a2]
  (let [a1-len (alength a1)
        a2-len (alength a2)
        prev (int-array (inc a2-len))
        curr (int-array (inc a2-len))]
    (loop [i 0 max-len 0 prev prev curr curr]
      (if (< i a1-len)
        (recur (inc i)
               (loop [j 0 max-len max-len]
                 (if (< j a2-len)
                   (if (= (aget a1 i) (aget a2 j))
                     (let [match-len (inc (aget prev j))]
                       (do
                         (aset-int curr (inc j) match-len)
                         (recur (inc j) (max max-len match-len))))
                     (do
                       (aset-int curr (inc j) 0)
                       (recur (inc j) max-len)))
                   max-len))
               curr
               prev)
        max-len))))

Now let's test these on my machine:

(def pool "ABC")
(defn get-random-id [n] (apply str (repeatedly n #(rand-nth pool))))
(def a1 (into-array (take 10000 (repeatedly #(get-random-id 5)))))
(def a2 (into-array (take 10000 (repeatedly #(get-random-id 5)))))

Java:

(time (Ratcliff/lcs a1 a2))
"Elapsed time: 1521.455 msecs"

Clojure:

(time (lcs a1 a2))
"Elapsed time: 19863.633 msecs"

Clojure is quick but still an order of magnitude slower than Java. Is there anything I can do to close this gap? Or have I maxed it out and one order of magnitude is the "minimal Clojure overhead."

As you can see I am already using the "low level" construct of loop, I am using native Java arrays and I have type-hinted the parameters to avoid reflection.

There some algorithm optimizations possible, but I don't want to go there right now. I am curious how close to Java performance I can get. If I can't close the gap I'll just go with the Java code. The rest of this project is in Clojure, but perhaps sometimes dropping down to Java for performance is necessary.

first, put (set! \*warn-on-reflection\* true) at the top of your Clojure implementation namespace and reload, noting any warnings, then address them. Second, ideally, use https://github.com/hugoduncan/criterium for benchmarking. Next, check it in a profiler... — Hendekagon, Feb 19 '13 at 04:35
Thanks, Hendekagon! All good tips. I got rid of the Auto-boxing warning (by wrapping call to inner loop with (int ) ) but that didn't improve performance. Then I turned all calls to inc to unchecked-inc. That didn't have any impact either. — Geo G, Feb 19 '13 at 05:40
This doesn't answer your question, but a naive idiomatic Clojure implementation like [this](https://gist.github.com/anonymous/4983479), may be slower but it's easier to reason about and works with any type, and it finds sequences across the overlap between the end and start (and it was a billion times easier to write than the Java one would be). If only there was a way to make this code run as fast as Java! — Hendekagon, Feb 19 '13 at 06:12
Yeah I actually started with an idiomatic implementation and that was too slow for this task. What I'm really trying to do is port the some of the functionality of the Python difflib.SequenceMatcher for my Clojure project. — Geo G, Feb 19 '13 at 06:31
Have you used `(set! *unchecked-math* true)` ? This can make a difference for low level numerical code. — mikera, Feb 19 '13 at 07:10
GeoG - would you mind posting your idiomatic code for comparison please ? — Hendekagon, Feb 19 '13 at 08:07
@Hendekagon Unfortunately that code is gone. I realized pretty quickly it wasn't going to be fast enough. But don't worry, it was nothing to write home about :) — Geo G, Feb 19 '13 at 13:54
@mikera Tried (set! *unchecked-math* true) and it had very little impact it any. Thanks for the top though. Good to know that option exists. — Geo G, Feb 19 '13 at 14:02
@cgrand Just benchmarked it with criterium. Indeed 30% faster and pretty much around Java speed! Thanks! This was very helpful. I learned a lot. — Geo G, Feb 21 '13 at 19:58

cgrand · Accepted Answer · 2013-02-20T13:07:31.390

13

EDIT: Added a faster uglier version below the first one.

Here is my take:

(defn my-lcs [^objects a1 ^objects a2]
  (first
    (let [n (inc (alength a1))]
      (areduce a1 i 
        [max-len ^ints prev ^ints curr] [0 (int-array n) (int-array n)]
        [(areduce a2 j max-len (unchecked-long max-len)
           (let [match-len 
                 (if (.equals (aget a1 i) (aget a2 j))
                   (unchecked-inc (aget prev j))
                   0)]
             (aset curr (unchecked-inc j) match-len)
             (if (> match-len max-len)
               match-len
               max-len)))
         curr prev]))))

Main differences with yours: a[gs]et vs a[gs]et-int, use of unchecked- ops (implicitly through areduce), use of a vector as the return value (and "swap" mechanism) and max-len is coerced to primitive before the inner loop (primitive-valued loops are problematic, slightly less since 1.5RC2 but the support isn't perfect yet, however *warn-on-reflection* is not silent).

And I switched to .equals instead of = to avoid the logic in Clojure's equiv.

EDIT: let's get ugly and restore the arrays swap trick:

(deftype F [^:unsynchronized-mutable ^ints curr
            ^:unsynchronized-mutable ^ints prev]
  clojure.lang.IFn
  (invoke [_ a1 a2]
    (let [^objects a1 a1
          ^objects a2 a2]
      (areduce a1 i max-len 0
        (let [m (areduce a2 j max-len (unchecked-long max-len)
                  (let [match-len 
                        (if (.equals (aget a1 i) (aget a2 j))
                          (unchecked-inc (aget prev j))
                          0)]
                    (aset curr (unchecked-inc j) (unchecked-int match-len))
                    (if (> match-len max-len)
                      match-len
                      max-len)))
              bak curr]
          (set! curr prev)
          (set! prev bak)
          m)))))

(defn my-lcs2 [^objects a1 a2]
  (let [n (inc (alength a1))
        f (F. (int-array n) (int-array n))]
    (f a1 a2)))

On my box, it's 30% faster.

edited Feb 20 '13 at 13:07

answered Feb 19 '13 at 13:31

cgrand

7,939
28
32

Thanks. I tested each modification separately. As discussed above aset is indeed ~3x faster than aset-int. So that brought it down to 4x running time of Java. The unchecked math didn't make any noticeable difference. But using .equals instead of = gave another 3-4x improvement! With all of the optimizations so far I am at ~1.5 secs for Java and ~1.9 secs for Clojure. Not bad at all! – Geo G Feb 19 '13 at 15:49
1

Your version runs very close to java ~1.9 secs vs. ~1.5 secs, but as I said above I believe the performance gains are coming from switching to aset and .equals. I get the same performance as your version by switching to aset and .equals and with no other modifications. – Geo G Feb 19 '13 at 15:56
You could probably just use `==` instead of `.equals` – dnolen Feb 19 '13 at 17:01
With `*unchecked-math* true` I don't think you need `unchecked-*` functions. – Marko Topolnik Feb 19 '13 at 17:14
@cgrand - thanks also for providing an alternative implementation using areduce. It feels a bit more Clojurely than looping, though both are not too bad. – Geo G Feb 19 '13 at 17:24
@GeoG Exactly, you need `(long ...)` to take you into the world of primitives, but you don't need that transition to be explicitly `unchecked`. – Marko Topolnik Feb 19 '13 at 20:06
The second version runs at Java speed! I'm gonna have to dig into it deeper when I have a chance to fully understand everything. – Geo G Feb 21 '13 at 20:02
this code is more about the compiler than the problem at hand – Hendekagon Feb 25 '13 at 00:48

score 6 · Answer 2 · answered Feb 19 '13 at 06:08

Here are a couple improvements:

No advantage to fancy type hinting, just use ^objects
aset-int is deprecated I believe -- just plain old aget is faster, by about 3x overall it seems

Beyond that (and the long type hint on the recur mentioned above), I don't see any obvious ways to improve further.

(defn lcs
  [^objects a1 ^objects a2]
  (let [a1-len (alength a1)
        a2-len (alength a2)
        prev (int-array (inc a2-len))
        curr (int-array (inc a2-len))]
    (loop [i 0 max-len 0 prev prev curr curr]
      (if (< i a1-len)
        (recur (inc i)
               (long (loop [j 0 max-len max-len]
                 (if (< j a2-len)
                   (if (= (aget a1 i) (aget a2 j))
                     (let [match-len (inc (aget prev j))]
                       (do
                         (aset curr (inc j) match-len)
                         (recur (inc j) (max max-len match-len))))
                     (do
                       (aset curr (inc j) 0)
                       (recur (inc j) max-len)))
                   max-len)))
               curr
               prev)
        max-len))))
#'user/lcs
user> (time (lcs a1 a2))
"Elapsed time: 3862.211 msecs"

aset is indeed 3 times faster! Thanks for that one. On my machine I am at ~1.5 msecs for Java and ~6 msecs for Clojure. Not bad. Also I tried (long ) rather than (int ) but it made no difference. — Geo G, Feb 19 '13 at 06:23
Also changed that the parameters typehint to ^objects rather than #^"[Ljava.lang.String;" with no impact to performance. This is surprising. What are you hinting with ^objects that Clojure doesn't already know? — Geo G, Feb 19 '13 at 06:26
Replceing max with Math/max doesn't seem to produce significant improvement. Also replacing it with (if (> match-len max-len) match-len max-len) doesn't seem to have any impact either. — Geo G, Feb 19 '13 at 17:10

Clojure Performance For Expensive Algorithms

2 Answers2

Linked