The question is what the emitted assembly does
There's no need to guess; you can just look. Let's use this code:
use std::ops::Add;
#[derive(Copy, Clone, Debug)]
struct Num(i32);
impl Add for Num {
type Output = Num;
fn add(self, rhs: Num) -> Num {
Num(self.0 + rhs.0)
}
}
#[inline(never)]
fn example() -> Num {
let a = Num(0);
let b = Num(1);
let c = a + b;
let d = a + b;
c + d
}
fn main() {
println!("{:?}", example());
}
Paste it into the Rust Playground, then select the Release mode and view the LLVM IR:

Search through the result to see the definition of the example
function:
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h60e923840d8c0cd0E() unnamed_addr #2 {
start:
ret i32 2
}
That's right, this was completely and totally evaluated at compile time and simplified all the way down to a simple constant. Compilers are pretty good nowadays.
Maybe you want to try something not quite as hardcoded?
#[inline(never)]
fn example(a: Num, b: Num) -> Num {
let c = a + b;
let d = a + b;
c + d
}
fn main() {
let something = std::env::args().count();
println!("{:?}", example(Num(something as i32), Num(1)));
}
Produces
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h73d4138fe5e9856fE(i32 %a) unnamed_addr #3 {
start:
%0 = shl i32 %a, 1
%1 = add i32 %0, 2
ret i32 %1
}
Oops, the compiler saw that we we basically doing (x + 1) * 2, so it did some tricky optimizations here to get to 2x + 2. Let's try something harder...
#[inline(never)]
fn example(a: Num, b: Num) -> Num {
a + b
}
fn main() {
let something = std::env::args().count() as i32;
let another = std::env::vars().count() as i32;
println!("{:?}", example(Num(something), Num(another)));
}
Produces
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h73d4138fe5e9856fE(i32 %a, i32 %b) unnamed_addr #3 {
start:
%0 = add i32 %b, %a
ret i32 %0
}
A simple add
instruction.
The real takeaway from this is:
- Look at the generated assembly for your cases. Even similar-looking code might optimize differently.
- Perform micro and macro benchmarking. You never know exactly how the code will play out in the big picture. Maybe all your cache will be blown, but your micro benchmarks will be stellar.
is the Rust compiler smart enough to know that in this case, it's not necessary to do that copying?
As you just saw, the Rust compiler plus LLVM are pretty smart. In general, it is possible to elide copies when it knows that the operand isn't needed. Whether it will work in your case or not is tough to answer.
Even if it did, you might not want to be passing large items via the stack as it's always possible that it will need to be copied.
And note that you don't have to implement copy for the value, you can choose to only allow it via references:
impl<'a, 'b> Add<&'b Num> for &'a Num {
type Output = Num;
fn add(self, rhs: &'b Num) -> Num {
Num(self.0 + rhs.0)
}
}
In fact, you may want to implement both ways of adding them, and maybe all 4 permutations of value / reference!