In Rust, is Option compiled to a runtime check or an instruction jump?

Question

In Rust, Option is defined as:

pub enum Option<T> {
    None,
    Some(T),
}

Used like so:

fn may_return_none() -> Option<i32> {
    if is_full_moon {
        None
    } else {
        Some(1)
    }
}

fn main() {
    let optional = may_return_none();
    match optional {
        None => println!("None"),
        Some(v) => println!("Some"),
    }
}

I'm not familiar with Rust internals, but initially I assumed it might work similar to Nullable in .NET, so the compiled logic of my above Rust code would be like so:

// occupies `sizeof(T) + 1` memory space, possibly more depending on `Bool`'s alignment, so `Nullable<Int32>` consumes 5 bytes.
struct Nullable<T> {
    Bool hasValue;
    T value;
}

Nullable<Int32> MayReturnNone() {
    if( isFullMoon )
        // as a `struct`, the Nullable<Int32> instance is returned via the stack
        return Nullable<Int32>() { HasValue = false }
    else
        return Nullable<Int32>() { HasValue = true, Value = 1 }
}

void Test() {
    Nullable<Int32> optional = may_return_none();
    if( !optional.HasValue ) println("None");
    else                     println("Some");
}

However this isn't a zero-cost abstraction because of the space required for the Bool hasValue flag - and Rust makes a point of providing zero-cost abstractions.

I realise that Option could be implemented via a direct return-jump by the compiler, though it would need the exact jump-to values to be provided as arguments on the stack - as though you can push multiple return addresses:

(Psuedocode)

mayReturnNone(returnToIfNone, returnToIfHasValue) {

    if( isFullMoon ) {
        cleanup-current-stackframe
        jump-to returnToIfNone
    else {
        cleanup-current-stackframe
        push-stack 1
        jump-to returnToIfHasValue
    }

test() {

    mayReturnNone( instructionAddressOf( ifHasValue ), instructionAddressOf( ifNoValue ) )
ifHasValue:
    println("Some")
ifNoValue:
    println("None")
}

Is this how it's implemented? This approach also works for other enum types in Rust - but this specific application I've demonstrated is very brittle and breaks if you want to execute code in-between the call to mayReturnNone and the match statement, for example (as mayReturnNone will jump directly to the match, skipping intermediate instructions).

You may also be interested in [the "null pointer optimization"](http://stackoverflow.com/q/30414068/155423) for `Option`. — Shepmaster, Mar 31 '17 at 00:50
You are misunderstanding Zero-Cost Abstraction. It does not mean that you get functionality for free; it means that you get the least overhead to implement your functionality. Or it the words of Stroustrup (C++): You don't pay for what you don't need, and what you do pay for you could not handcraft better. — Matthieu M., Mar 31 '17 at 06:39
Exactly, "zero-cost" referees to cost incurred *by the abstraction*, not that underlying functionality itself. — user4815162342, Apr 01 '17 at 19:35

Shepmaster · Accepted Answer · 2017-03-31T04:18:20.947

It depends entirely on optimization. Consider this implementation (playground):

#![feature(asm)]

extern crate rand;

use rand::Rng;

#[inline(never)]
fn is_full_moon() -> bool {
    rand::thread_rng().gen()
}

fn may_return_none() -> Option<i32> {
    if is_full_moon() { None } else { Some(1) }
}

#[inline(never)]
fn usage() {
    let optional = may_return_none();
    match optional {
        None => unsafe { asm!("nop") },
        Some(v) => unsafe { asm!("nop; nop") },
    }
}

fn main() {
    usage();
}

Here, I've used inline assembly instead of printing because it doesn't clutter up the resulting output as much. Here's the assembly for usage when compiled in release mode:

    .section    .text._ZN10playground5usage17hc2760d0a512fe6f1E,"ax",@progbits
    .p2align    4, 0x90
    .type   _ZN10playground5usage17hc2760d0a512fe6f1E,@function
_ZN10playground5usage17hc2760d0a512fe6f1E:
    .cfi_startproc
    pushq   %rax
.Ltmp6:
    .cfi_def_cfa_offset 16
    callq   _ZN10playground12is_full_moon17h78e56c4ffd6b7730E
    testb   %al, %al
    je  .LBB1_2
    #APP
    nop
    #NO_APP
    popq    %rax
    retq
.LBB1_2:
    #APP
    nop
    nop
    #NO_APP
    popq    %rax
    retq
.Lfunc_end1:
    .size   _ZN10playground5usage17hc2760d0a512fe6f1E, .Lfunc_end1-_ZN10playground5usage17hc2760d0a512fe6f1E
    .cfi_endproc

The quick rundown is:

It calls the is_full_moon function (callq _ZN10playground12is_full_moon17h78e56c4ffd6b7730E).
The result of the random value is tested (testb %al, %al)
One branch goes to the nop, the other goes to the nop; nop

Everything else has been optimized out. The function may_return_none basically never exists; no Option was ever created, the value of 1 was never materialized.

I'm sure that various people have different opinions, but I don't think I could have written this any more optimized.

Likewise, if we use the value in the Some (which I changed to 42 to find easier):

Some(v) => unsafe { asm!("nop; nop" : : "r"(v)) },

Then the value is inlined in the branch that uses it:

    .section    .text._ZN10playground5usage17hc2760d0a512fe6f1E,"ax",@progbits
    .p2align    4, 0x90
    .type   _ZN10playground5usage17hc2760d0a512fe6f1E,@function
_ZN10playground5usage17hc2760d0a512fe6f1E:
    .cfi_startproc
    pushq   %rax
.Ltmp6:
    .cfi_def_cfa_offset 16
    callq   _ZN10playground12is_full_moon17h78e56c4ffd6b7730E
    testb   %al, %al
    je  .LBB1_2
    #APP
    nop
    #NO_APP
    popq    %rax
    retq
.LBB1_2:
    movl    $42, %eax  ;; Here it is
    #APP
    nop
    nop
    #NO_APP
    popq    %rax
    retq
.Lfunc_end1:
    .size   _ZN10playground5usage17hc2760d0a512fe6f1E, .Lfunc_end1-_ZN10playground5usage17hc2760d0a512fe6f1E
    .cfi_endproc

However, nothing can "optimize" around a contractural obligation; if a function has to return an Option, it has to return an Option:

#[inline(never)]
pub fn may_return_none() -> Option<i32> {
    if is_full_moon() { None } else { Some(42) }
}

This makes some Deep Magic assembly:

    .section    .text._ZN10playground15may_return_none17ha1178226d153ece2E,"ax",@progbits
    .p2align    4, 0x90
    .type   _ZN10playground15may_return_none17ha1178226d153ece2E,@function
_ZN10playground15may_return_none17ha1178226d153ece2E:
    .cfi_startproc
    pushq   %rax
.Ltmp6:
    .cfi_def_cfa_offset 16
    callq   _ZN10playground12is_full_moon17h78e56c4ffd6b7730E
    movabsq $180388626432, %rdx
    leaq    1(%rdx), %rcx
    testb   %al, %al
    cmovneq %rdx, %rcx
    movq    %rcx, %rax
    popq    %rcx
    retq
.Lfunc_end1:
    .size   _ZN10playground15may_return_none17ha1178226d153ece2E, .Lfunc_end1-_ZN10playground15may_return_none17ha1178226d153ece2E
    .cfi_endproc

Let's hope I get this right...

Load the 64-bit value 0x2A00000000 to %rdx. 0x2A is 42. This is our Option being built; it's the None variant.
Load %rdx + 1 into %rcx. This is the Some variant.
We test the random value
Depending on the result of the test, move the invalid value to %rcx or not
Move %rcx to %rax - the return register

The main point here is that regardless of optimization, a function that says it's going to return data in a specific format has to do so. Only when it's inlined with other code is it valid to remove that abstraction.

Thanks for the update! I forgot to switch to the release mode in my answer :( — viraptor, Mar 31 '17 at 03:48

viraptor · Answer 2 · 2017-03-31T03:47:37.673

2

Warning: this comes from the debug build, not release. See the other answer for an optimised version which behaves differently.

You can check the code on the Rust playground

The function compiles to:

    .cfi_startproc
    pushq   %rbp
.Ltmp6:
    .cfi_def_cfa_offset 16
.Ltmp7:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp8:
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
.Ltmp9:
    .loc    1 6 0 prologue_end
    callq   is_full_moon@PLT
    movb    %al, -9(%rbp)
    movb    -9(%rbp), %al
    testb   $1, %al
    jne .LBB1_3
    jmp .LBB1_4
.LBB1_3:
    .loc    1 7 0
    movl    $0, -8(%rbp)
    .loc    1 6 0
    jmp .LBB1_5
.LBB1_4:
    .loc    1 10 0
    movl    $1, -8(%rbp)
    movl    $1, -4(%rbp)
.LBB1_5:
    .loc    1 12 0
    movq    -8(%rbp), %rax
    addq    $16, %rsp
    popq    %rbp
    retq
.Ltmp10:
.Lfunc_end1:
    .size   _ZN8rust_out15may_return_none17hb9719b83eae05d85E, .Lfunc_end1-_ZN8rust_out15may_return_none17hb9719b83eae05d85E
    .cfi_endproc

Which isn't really returning to different places. The space for Option<i32> contains the i32 value as well. That means your function is writing either just the None/Some marker:

movl    $0, -8(%rbp)

Or the value as well:

movl    $1, -8(%rbp)
movl    $1, -4(%rbp)

So I guess the answer to your question is that this:

Rust makes a point of providing zero-cost abstractions

is an assumption that doesn't apply to every single case.

edited Mar 31 '17 at 03:47

answered Mar 31 '17 at 00:16

viraptor

33,322
10
107
191

Thank you - I didn't think of testing my hypothesis with compiler output. Can you briefly annotate the assembly? If I'm reading it correctly, the compiler is storing the "has value" flag in a register - as that doesn't consume real memory I suppose that would still make it zero-cost, at least in this case. – Dai Mar 31 '17 at 00:36
2

*is an assumption that doesn't apply to every single case* — the meaning of this is that the programmer couldn't write it better. Would you mind sharing what a better representation of an `Option` would be? – Shepmaster Mar 31 '17 at 00:46
@Dai The Option value is saved on the stack. `%rbp` is the current stack position, and `-8(%rbp)` means some local variable. – viraptor Mar 31 '17 at 02:35
2

@Shepmaster You can generate separate code paths for None / Some results which skip the tag completely. (similar to what the question implied) Of course it depends on whether you care about the stack space, but less about code size... it could matter if you're in a recursive function, building `Option`. (that's a pathological case of course) Here, it doesn't matter. The better summary is probably "literal zero-cost isn't always practical/worth considering"? – viraptor Mar 31 '17 at 02:43
@viraptor That's an interesting idea (and one that I've thought about with regard to language design), but it only works if the function is only ever called as the argument for a `match`. But `Option` is semantically *data*, and can be used as such; you could, for instance, call a function returning an `Option`, hold onto it in a variable for a while, and then pass it as an argument to another function. – Kyle Strand Mar 31 '17 at 17:41

score 2 · Answer 3 · answered Apr 14 '23 at 16:08

A bit late to the party, however I googled the same thing, and I believe this thread lacks some points.

Rust optimizer can elide discriminant if there is invalid value for given enum type. This is an optimization in the compiler for all enum types, not just Option. Things like Option<bool> or Option<&mut T> will not store additional flag, however Option<i32> will, as there's no invalid value for i32.

In Rust terminology, this invalid bit pattern is called niche.

Of course other optimization can apply as well, as already mentioned above. The compiler can apply optimizations like constatnt propagation to elide Option fully but this is beyond the scope of this thread.

Side note: This is actually extremly similar to what a C programmer would do. Optional pointer are normally represented as NULL, however nullable int is a bit of an issue, some functions like open choose to return value that does not make sense in a given situation, like -1, however that is impossible to do automatically for obvious reasons.

References:

In Rust, is Option compiled to a runtime check or an instruction jump?

3 Answers3

Warning: this comes from the debug build, not release. See the other answer for an optimised version which behaves differently.

Linked