Java is almost twice as fast as Rust?!? What am I missing?

Question

I ported a Java class that I have that loops over all possible unordered combinations of choosing k elements from n choices to Rust, expecting Rust to help me speed up the calculations. But when running both head-to-head, it turned out that Java was almost two times faster!

Since this does not sound right to me at all and since I'm just getting started in Rust, I must be doing something wrong and was wondering if someone with more Rust experience would help me figure out why my Rust code is so much slower.

Here is my Java code for the generic interface, the implementation, and the test-code:

public interface Choice<Type> {

    /**
     * Returns the number of possible options that this choice provides.
     * 
     * @return the number of choices
     */
    public long getChoices();
    
    /**
     * Returns the choice with the given index.
     *
     * @param index - the index of the choice to return
     * @return the choice of the given index
     */ 
    public Type getChoice(long index);
}

public class NChooseK implements Choice<int[]> {
    
    private final int n;
    private final int k;
    
    private final long count;
    
    public NChooseK(int n, int k) {
        if ((n<=0) || (k<0) || (k>n)) throw new IllegalArgumentException();
        this.n     = n;
        this.k     = k;
        this.count = choose(n, k);
    }

    @Override
    public long getChoices() {
        return count;
    }

    @Override
    public int[] getChoice(long m) {
        if (k==0) return new int[0];
        
        long  count = this.count;
        int[] result = new int[this.k];
        int   n = this.n;
        int   k = this.k;
        long  x = (count-1) - m;
        
        while (true) {
            if (n == k) {
                while (true) {
                    result[this.k - k] = this.n - k;
                    if (k==1) return result;
                    k--;
                }
            }
            
            count = count * (n-k) / n;
            if (x >= count) {
                result[this.k - k] = this.n - n;
                if (k==1) return result;
                
                x -= count;
                count = count * k / (n-k);

                k--;
            }
            n--;
        }
    }

    private long choose(int n, int k) {
        if (n<k) return 0;
        if (k>n-k) k=n-k;
        
        long b=1;
        for (int i=1, m=n; i<=k; i++, m--)
            b = b*m/i;
        return b;
    }
}

public class Test {
    public static void main(String[] args) {

        NChooseK nck = new NChooseK(26, 13);
        long choices = nck.getChoices();
        System.out.println("Running ("+choices+" choices)...");
        long start = System.currentTimeMillis();
        for (long index = 0; index<choices; index++) {
            int[] choice = nck.getChoice(index);
            //System.out.println(ArrayTools.toString(choice));
        }
        long end = System.currentTimeMillis();
        System.out.println("Done ("+((end - start)/1000.0)+"s)!");
    }
}

And here is what I thought would be the closest translation to Rust:

pub trait Choice<Type> {
    
    /// Returns the number of possibilities for this choice.
    fn get_choices(&self) -> u32;
    
    /// Returns the posibility of the given index.
    fn get_choice(&self, index: u32) -> Type;
}

use super::choice::Choice;

pub struct NChooseK {
    n: u32,
    k: u32,
    count: u32,
}

impl NChooseK {
    pub fn new(n: u32, k: u32) -> Result<NChooseK, &'static str> {
        if k > n {
            Err("invalid parameters: k cannot be larger than n")
        } else {
            Ok(NChooseK {
                n: n,
                k: k,
                count: choose(n, k).unwrap() as u32,
            })
        }
    }
}

impl<'a> Choice<Vec<u32>> for NChooseK {
        
    fn get_choices(&self) -> u32 {
        self.count
    }
    
    fn get_choice(&self, m: u32) -> Vec<u32> {
        
        if self.k == 0 {
            return vec![];
        }
        
        let mut count = self.count;
        let mut result:Vec<u32> = Vec::with_capacity(self.k as usize);
        let mut n = self.n;
        let mut k = self.k;
        let mut x = (count-1) - m;
        
        loop {
            if n == k {
                loop {
                    result.push(self.n - k);
                    if k == 1 {
                        return result;
                    }
                    
                    k -= 1;
                }
            }

            count = count * (n - k) / n;
            if x >= count {
                result.push(self.n - n);
                if k == 1 {
                    return result;
                }
                
                x -= count;
                count = count * k / (n - k);

                k -= 1;
            }

            n -= 1;
        }
    }
}


fn choose(n: u32, mut k: u32) -> Option<u64> {
    
    if k > n-k {
        k = n-k;    
    }
    
    let mut b : u64 = 1;
    let mut m = n;
    for i in 1..=k {
        if b > 0xFFFFFFFFFFFFFFFF / (m as u64) {
            return None;
        }
        b  = b * (m as u64) / (i as u64);
        m -= 1;
    }
        
    Some(b)
}

fn main() {
    let nck = NChooseK::new(26, 13).unwrap();
    let choices = nck.get_choices();
    println!("Running ({} choices)...", choices);
    let start = time::precise_time_s();
    for index in 0..choices {
        let choice = nck.get_choice(index);
        //println!("{:?}", choice);
    }
    let end = time::precise_time_s();
    println!("Done ({}s)!", end - start);
}

The Rust code takes around 12 to 12.5s to run through the ~10 million calls to get_choice and the Java code takes 6.5 to 7s! WTF?!?!?

This is using rustc v1.45.2 and OpenJDK v1.8.0_212-3-redhat on Windows 7 64-bit.

Notes:

I initially had get_choices also return a u64 in Rust, but changed it to a u32 to try to eliminate as much type casting as possible (didn't actually make a difference, though).
I also tried replacing all u32s with i32s. Also no difference.
Commenting out the result.push(...) statements lowers the run-time to 9 to 9.5s. That's a big difference (bigger than I expected!), but still way slower than Java!
If I uncomment the print statements inside the inner loop and change the parameters to something more reasonable (NChooseK(7, 3), for example), both versions do produce the same exact output values (ArrayTools.toString(...) is just a simple helper-function that concatenates the components of an int-array with commas into a string; decided to leave it out since there is already so much code here)

Have you tried running with `cargo run --release` in order to enable optimizations? — Coder-256, Aug 12 '20 at 05:51
Also there's `Duration::now` and [`Duration::elapsed`](https://doc.rust-lang.org/std/time/struct.Instant.html#method.elapsed) to get the Duration between two time points. — hellow, Aug 12 '20 at 05:55
@Coder-256 I had just used the "run" button in the IDE. But I just tried it from the command line and it does make a big difference. With `--release`, Rust now takes 4.5 to 5s, which is faster than Java, although still not as much as I was hoping for. — Markus A., Aug 12 '20 at 05:56
@Coder-256 If you like, you can turn your comment into an answer, since this definitely does solve the mystery. (Also, thank you for the rust code formatting edits!) — Markus A., Aug 12 '20 at 05:57
Even though you found your answer now, I would like to point out that there is nothing surprising *in general* about your results. In fact, development of Java started in 1990, Rust in 2006, so the Java community had more than *twice* the amount of time of the Rust community to tweak their implementations. Plus, the Java community is throwing more money and more people at the problem of making Java fast than the Rust community is, so I would actually *expect* Java to be faster. According to Wikipedia, "Idiomatic Rust can be as fast as idiomatic C++". Well idiomatic Java can be *faster* than … — Jörg W Mittag, Aug 12 '20 at 06:18
… idiomatic C++, so also no surprise there. The performance of a language (which is a concept that doesn't even make sense, but let's just roll with it), is generally a function of the amount of resources, money, time, research, and engineers thrown at optimizing it, and Java beats almost any other language in those categories. — Jörg W Mittag, Aug 12 '20 at 06:19
(Disclaimer: I didn't dive into your algorithm.) Typically, by changing language, runtime environment, hardware etc. you might gain or lose a factor up to 10. By choosing a better algorithm, factors of 1000 and more aren't uncommon. — Ralf Kleberhoff, Aug 12 '20 at 07:10

score 9 · Accepted Answer · edited Aug 12 '20 at 05:59

Always use cargo build --release or cargo run --release to have rustc/llvm optimize your code when you try to squeeze performance:

$ cargo build 2>/dev/null && time cargo -q run 2>/dev/null

Running (10400600 choices)...
Done (9.487796306610107s)!

real    0m9,512s
user    0m9,500s
sys     0m0,000s

$ cargo build --release 2>/dev/null && time cargo -q run --release 2>/dev/null
    
Running (10400600 choices)...
Done (3.2046568393707275s)!

real    0m3,229s
user    0m3,222s
sys     0m0,008s

Java is almost twice as fast as Rust?!? What am I missing?

1 Answers1