6

While solving problem from Hackkerank (https://www.hackerrank.com/challenges/string-compression/problem) I've written 2 implementations with and without transducers.

I was expecting the transducer implementation to be faster, than the function chaining operator ->>. Unfortunately, according to my mini-benchmark the chaining operator was outperforming the transducer by 2.5 times.

I was thinking, that I should use transducers wherever possible. Or didn't I understand the concept of transducers correctly?

Time:

"Elapsed time: 0.844459 msecs"

"Elapsed time: 2.697836 msecs"

Code:

(defn string-compression-2
  [s]
  (->> s
       (partition-by identity)
       (mapcat #(if (> (count %) 1)
               (list (first %) (count %))
               (list (first %))))
       (apply str)))

(def xform-str-compr
  (comp (partition-by identity)
        (mapcat #(if (> (count %) 1)
                (list (first %) (count %))
                (list (first %))))))

(defn string-compression-3
  [s]
  (transduce xform-str-compr str s))

(time (string-compression-2 "aaabccdddd"))
(time (string-compression-3 "aaabccdddd"))
Community
  • 1
  • 1
denis631
  • 1,765
  • 3
  • 17
  • 38
  • 1
    I've realized, that while running functions inside repl performance was more or less, equal (transducer solution was faster by 0.001) resulting to ~0.1 msecs Maybe I should post another question, but why repl outperforms .clj file running time? – denis631 Feb 09 '18 at 13:44
  • There is too much noise when running one test. Run the same test N times in a loop and compute the total time. – coredump Feb 09 '18 at 13:46
  • Answer: because the version with the transducer is using `str` repeatedly, as a reducing function. – Vincent Cantin Oct 19 '18 at 06:56
  • @Vincent took me a while to get it :D Wrote it done in the conversation under the answer :sweat_smile – denis631 Oct 19 '18 at 20:24

1 Answers1

6

The transducer version does seem to be faster, according to Criterium:

(crit/quick-bench (string-compression-2 "aaabccdddd"))
             Execution time mean : 6.150477 µs
    Execution time std-deviation : 246.740784 ns
   Execution time lower quantile : 5.769961 µs ( 2.5%)
   Execution time upper quantile : 6.398563 µs (97.5%)
                   Overhead used : 1.620718 ns

(crit/quick-bench (string-compression-3 "aaabccdddd"))
             Execution time mean : 2.533919 µs
    Execution time std-deviation : 157.594154 ns
   Execution time lower quantile : 2.341610 µs ( 2.5%)
   Execution time upper quantile : 2.704182 µs (97.5%)
               Overhead used : 1.620718 ns

As coredump commented, a sample size of one is not enough to say whether one approach is generally faster than the other.

Taylor Wood
  • 15,886
  • 1
  • 20
  • 37
  • So should I always use function composition and transducers instead of function chaining? – denis631 Feb 09 '18 at 13:50
  • 2
    If performance is your primary concern, you just have to benchmark (with a range of inputs) to see which approach outperforms. I wouldn't say _always_ use transducers; they have performance benefits in many cases but they have non-performance related benefits too. "It depends" :) – Taylor Wood Feb 09 '18 at 13:54
  • but my functions are not completely the same, right? I've been watching a transducer video by Rich and he showed the abstraction of the map with the help of `step function`. So in my case, the step function is `str`, which means, that I will call str function on all my intermediate results, right? Whereby the example with `->>` will call the function `str` only once which should be definitely faster, right? My point is, it wasn't smart using `str` as a step function, right? What are your thoughts on this? – denis631 Feb 10 '18 at 09:53
  • 3
    FWIW, the [xforms](https://github.com/cgrand/xforms/) library has a `net,cgrand.xforms.rfs/str` reducing function that uses a stringbuilder to avoid that problem. – madstap Feb 10 '18 at 21:56