0

How is the match expression implemented at a high level? What happens under the hood for the compiler to know how to direct certain strains of code to one branch vs. the other, figuring it out at compile time? I don't see how this is possible without storing type information for use at runtime.

Something like this example:

fn tree_weight_v1(t: BinaryTree) -> i32 {
    match t {
        BinaryTree::Leaf(payload) => payload,
        BinaryTree::Node(left, payload, right) => {
            tree_weight_v1(*left) + payload + tree_weight_v1(*right)
        }
    }
}

/// Returns tree that Looks like:
///
///      +----(4)---+
///      |          |
///   +-(2)-+      [5]
///   |     |   
///  [1]   [3]
///
fn sample_tree() -> BinaryTree {
    let l1 = Box::new(BinaryTree::Leaf(1));
    let l3 = Box::new(BinaryTree::Leaf(3));
    let n2 = Box::new(BinaryTree::Node(l1, 2, l3));
    let l5 = Box::new(BinaryTree::Leaf(5));

    BinaryTree::Node(n2, 4, l5)
}

#[test]
fn tree_demo_1() {
    let tree = sample_tree();
    assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5);
}

By runtime type information, I mean you literally store a pointer to the type definition or something like that, so under the hood when the runtime execution reaches the match, it simply checks value.type == given_pattern_expression or some variant of that. From my understanding, this is not the case. That is, Rust doesn't store the type with the structs/records in the runtime. From my understanding, somehow it is computed at compile time.

I could probably implement this sort of feature if I stored the .type on each struct/record. Then you just simply look up the type and see if it matches. However, I cannot see a way to implement this at the compiler level, so it figures out matching in advance of runtime.

I think it happens at compile time because of posts like Is it possible to do a compile time type check on a generic in Rust?, or Runtime trait implementation checking?, amongst many other posts which seems to suggest everything happens at compile time, except for a tiny few cases where you can opt-into runtime checking. (this is my understanding from summarizing dozens of articles the past few days). Or another quote is:

One does need to check whether a given BinaryTree is a Leaf or is a Node, but the compiler statically ensures such checks are done: you cannot accidentally interpret the data of a Leaf as if it were a Node, nor vice versa.

That to me says the compiler statically figures out the pattern matching in the BinaryTree case I posted above. Somehow.

Integrating Pattern Matching with Type Checking is a random post about programming languages that agrees with my understanding, that you need runtime type information to accomplish this. So I am confused.

If types are patterns, you need to be able to determine the type of a value at runtime. In general, this requires runtime type-information (which many languages intentionally erase); but it also requires some supertype that contains the two cases (which many languages intentionally do not have).

Building a runtime reflection system for Rust ️ (Part 1) also explains how Rust doesn't have runtime reflection, which aids in my thinking that everything happens at compile time.

https://oswalt.dev/2021/06/polymorphism-in-rust/

Herohtar
  • 5,347
  • 4
  • 31
  • 41
Lance
  • 75,200
  • 93
  • 289
  • 503
  • When you talk about runtime type information, are you talking about storing the fact that `t` is a `BinaryTree` at runtime? If so, how would that help? Or are you talking about storing the information whether `t` is a `Leaf` or `Node`? In that case, that information (i.e. the enum's tag) *is* stored at runtime, but it's not type information because `Leaf` and `Node` aren't types. – sepp2k Mar 04 '22 at 22:34

2 Answers2

6

A match expression does not need runtime type information; as a match only accepts a single expression of a single known type, by definition it can leverage compile time information.

See also:

match at compile time vs runtime

At compile time, every match expression will be verified to be exhaustive: all possible shapes of the value are handled.

At run time, the code will determine which specific match arm is executed. You can think of a match as implemented via a fancy if-else chain.

As we humans tend to be not-extremely-precise when communicating, it's highly likely that some resources blur the line between these two aspects.

Concretely focusing on an enum

Enum variants are not standalone types. That is, given an enum Foo, Foo::Bar is not a type — it's a value of type Foo. This is the same as how false is not a type — it's a value of type bool. The same logic applies for i32 (type) and 42 (value).

In most cases, enums are implemented as a sea of bytes that correspond to the values each enum variant might be, with each variant's data layered on top of each other. This is known as a union.

Then a discriminant is added next to this soup of bytes. This is an integer value that specifies which variant the value is. Adding the discriminant makes it into a tagged union.

Matching on an enum is conceptually similar to this pseudo-Rust:

if discriminant(&enum_value) == VARIANT_1_DISCR {
    let data = mem::transmute(data(&enum_value));
} else if discriminant(&enum_value) == VARIANT_2_DISCR {
    let data = mem::transmute(data(&enum_value));
}

See also:

Reflection

how Rust doesn't have runtime reflection

I wouldn't agree that it doesn't have it, but it certainly doesn't have it by default. Runtime type information in Rust will usually leverage the Any trait:

fn thing(value: &dyn std::any::Any) {
    if let Some(s) = value.downcast_ref::<String>() {
        dbg!(s.len());
    } else if let Some(i) = value.downcast_ref::<i32>() {
        dbg!(i + 1);
    }
}

The absence of the Any trait when matching is an additional hint that no runtime type information is present.

Other languages

you need to be able to determine the type of a value at runtime

This is only true if you allow a value to be multiple types — Rust does not as it's a statically-typed language. In a dynamically-typed language where you want to match a value using a type, you would indeed need to have some amount of runtime type information.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • How does it determine which match arm to pick at runtime, yet not have any type information stored at runtime? – Lance Mar 04 '22 at 22:38
  • @Lance I'm struggling with how to express that answer differently. `match` statements do not have conditionals based on types, so you never need to test what type something is. – Shepmaster Mar 04 '22 at 22:44
  • 2
    @Lance the solution is tagged unions in some shape or form. Make sure the size of the enum is large enough to fit any of its variants then stick an extra byte on the front to distinguish them. This can change for some special edge cases as well like how `Option<*mut T>` knows that a null pointer will be the `None` variant so in memory it can just replace it with a `*mut T`. – Locke Mar 04 '22 at 22:49
5

The style of enums Rust uses which can hold data are generally referred to as "tagged unions". The general concept is they more or less work by applying a distinguishing tag along with the union of possible values.

Tagged Unions in Rust

For example, here I print an enum value in Rust using a match:

pub enum Foo {
    Bar(i32),
    Baz {
        a: i32,
        b: i8,
    }
}

match foo {
    Foo::Bar(x) => println!("Bar({})", x),
    Foo::Baz { a, b } => println!("Baz {{ a: {}, b: {} }}", a, b),
}

Tagged Unions in Memory

And here is how you might do the same thing using a language without tagged unions like C:

struct Foo {
    enum {
        Bar, Baz
    } tag;

    union {
        // Data for Bar
        int32_t Bar;
        
        // Data for Baz
        struct {
            int32_t a;
            int8_t b;
        } Baz;
    } data;
}

// Check the tag to distinguish variants before accessing data
switch foo.tag {
    case Bar:
        printf("Bar(%d)", foo.data.Bar);
        break;
    case Baz:
        printf("Baz { a: %d, b: %d }", foo.data.Baz.a, foo.data.Baz.b);
}

Now, I probably made a mistake or two in my C version and someone will probably be able to point out some improvements, but I hope it generally shows how a tagged union can distinguish between variants when fetching data.

Tagged Unions in Rust

I don't know much about how tagged unions are implemented in rust, but it is easy to imagine that rust could likely expand on these systems to further increase the efficiency of the match. For instance if you have one enum type nested in another you may be able to merge their tags together so you don't need to do multiple comparisons (Ex: The compiler might compress Option<Result<A, B>> into a OptionResultAB which only requires a single tag).

Another well known example is how Option<NonNull<T>> has the same size as *mut T since null can be used as the None value. There are also a few other similar cases like function pointers.

References

While I am just speculating on the form of tagged unions in Rust, I did find a better researched explanation which does a good job explaining the concept:

Locke
  • 7,626
  • 2
  • 21
  • 41
  • 3
    Great answer. If you want to beef up the last section, Rust's efficiency comes from its so-called *niche optimization* system. The compiler identifies that a type contains unused values, or niches, which it can use to squeeze a tag into an existing type without needing more space. Usually this is an implementation detail, though some niche optimizations are guaranteed for stability or FFI reasons. `Option>` is guaranteed to be optimized to a single pointer utilizing the null pointer niche for `None`. – John Kugelman Mar 04 '22 at 23:30
  • Additionally, if you dump the LLVM IR from Rust's compiler, you can see how its tagged unions are structured (since they're lowered into LLVM aggregate types). Effectively, you'll see the tag followed by an i32 array (large enough for the variants' contents). Then, then used, this array space is `bitcast` to store the contents of specific constructors/variants (because LLVM has no concept of `union` types - so it's emulated). – Colin James Mar 05 '22 at 11:47