1
use std::mem::size_of;

struct Position {
    x: f32,
    y: f32,
    z: f32,
}

struct PoolItem {
    entity_id: u32, // 4 bytes
    used: bool, // 1 bytes + 3 (padding)
    component: Position, // 12 bytes
}


assert_eq!(size_of::<u32>(), 4);
assert_eq!(size_of::<Position>(), 12);
assert_eq!(size_of::<PoolItem>(), 20);

As you can see, such a structure is 20 bytes long. Position is actually optional and depends on used.

Will the use of Option remove the need of the used field and decrease the structure size to 16?

struct PoolItem {
    entity_id: u32, // 4 bytes
    component: Option<Position>, // 12 bytes ?
}

If so, how is Option implemented for such a behavior to work?

My tests on Playground seem to indicate it doesn't work. Why?

trent
  • 25,033
  • 7
  • 51
  • 90
Narann
  • 819
  • 2
  • 8
  • 20
  • Original question was: “Can `Option` type removes the need of a boolean field?”. The edited version lose the hint IMHO. – Narann Apr 06 '20 at 10:24
  • What's the "hint" that is lost? You can feel free to edit the question again if my edit violated the spirit of the question; it seemed to me the main thrust of the question was to ask whether using `Option` would decrease the size of `PoolItem` vs. using a `bool` (to which the answer is no). In any case, my edit was made after the first two answers were already posted, so both answerers were responding to your original question, not my edit. – trent Apr 06 '20 at 15:40

3 Answers3

5

The precise implementation of Option doesn't really matter. What's obvious is that you can't store X amount of data in X amount of storage and also store whether or not data is there at all. An obvious implementation for Option would be to store both object and a boolean indicating if the object exists; clearly something like that is happening. Option is a convenience, it still has to store the information somewhere.

Note that outside of a struct (which must have consistent size) Option might avoid this cost, if the optimizer determines the Option has known "populated or not" status at all times, so the boolean might be elided in favor of the code always using it deterministically in the correct way (either reading the object from the stack if it logically exists, or not doing so when it doesn't). But in this case, the extra data is needed.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Since `Position` can never be a null pointer, the absence of `Position` could be encoded as null pointer, while the presence as the pointer to the `Position` object itself. It does not seem to be so, but until now I believed that `Option` was encoded this way and that "Zero-cost-abstraction" was the keyword for this behavior. – CoronA Apr 05 '20 at 13:29
  • 4
    There's no pointer. The 12 bytes of `Position` are embedded directly in the parent struct. – John Kugelman Apr 05 '20 at 13:32
  • 1
    Worth noting that changing entity_id to be a NonZeroU32 would allow the Option layout optimization to happen, with the small cost of having entities starting from id 1. – mgostIH Apr 05 '20 at 13:32
  • 1
    @CoronA: Beyond JohnKugelman's point, a pointer wouldn't be zero cost either. You'd be paying 4-8 bytes for the pointer, and the whole cost of the `Position` (along with allocator overhead) on top of that. And worse, you'd have to perform pointer lookup to "somewhere else" memory (may not be in cache line/page, and therefore goes much slower), where embedding the object directly avoids that expense (both the boolean and the object are adjacent, and should typically be in the same cache line). Avoiding that means allocating them in a block, and now you're using a pointer as a glorified boolean. – ShadowRanger Apr 05 '20 at 13:38
  • 2
    @CoronA It's worthwhile to note that if you did put `Position` behind a non-nullable pointer, such as a `Box`, `&Position` or `Rc`, the compiler would in fact encode it as you suggest. "Zero-cost abstraction" is a general term usually applied when comparing a built-in abstraction (like an `Option`) to a rhetorical hand-coded equivalent (like a struct containing a `T` and a boolean flag). In this sense, `Option` is indeed zero-cost, because even if the flag is necessary, it wouldn't be possible to hand-code it any better. – trent Apr 05 '20 at 16:13
3

Option<Position> needs to store the state (Some or None) somewhere, and because Position already contains 12 bytes of information, you need more space to store it. Usually this means that it adds an extra byte (plus padding) to store the state, although in some cases where the inner type has a known unused state. For example, a reference can point to address 0, so Option<&'_ T> could use 0 as the None state and take up the same number of bytes as &'_ T. For your Position type, however, that's not the case.

If you absolutely need your PoolItem struct to be as small as possible, and if you can spare one bit from your entity_id field (say, the highest bit, 231), you can use that to store the state instead:

const COMPONENT_USED_BIT: u32 = (1u32 << 31);

struct PoolItem {
    entity_id: u32, // lowest 31 bits = entity ID, highest bit = "component used"
    component: Position,
}

This might become a bit complex, since you need to ensure that you're treating that bit specially, but you can write a couple of simple accessor methods to ensure that the special bit is dealt with correctly.

impl PoolItem {
    /// Get entity ID, without the "component used" bit
    fn entity_id(&self) -> u32 {
        self.entity_id & !COMPONENT_USED_BIT
    }

    /// Set entity ID, keeping the existing "component used" bit
    fn set_entity_id(&mut self, entity_id: u32) {
        let component_used_bit = self.entity_id & COMPONENT_USED_BIT;
        self.entity_id = (entity_id & !COMPONENT_USED_BIT) | component_used_bit;
    }

    /// Get component if "component used" bit is set
    fn component(&self) -> Option<&Position> {
        if self.entity_id & COMPONENT_USED_BIT != 0 {
            Some(&self.component)
        } else {
            None
        }
    }

    /// Set component, updating the "component used" bit
    fn set_component(&mut self, component: Option<Position>) {
        if let Some(component) = component {
            self.component = component;
            self.entity_id |= COMPONENT_USED_BIT;
        } else {
            self.entity_id &= !COMPONENT_USED_BIT;
        }
    }
}

Playground example with tests

Frxstrem
  • 38,761
  • 9
  • 79
  • 119
  • What about [`NonZeroU32`](https://doc.rust-lang.org/std/num/struct.NonZeroU32.html)? – Narann Apr 05 '20 at 17:46
  • @Narann `NonZeroU32` and similar types allow the same compiler optimizations as references (since `0` is not a valid state), so `Option` takes up as many bytes as `NonZeroU32`. – Frxstrem Apr 05 '20 at 17:51
  • Of course, but wouldn't `Option` avoids dealing with bit masks? – Narann Apr 05 '20 at 19:51
  • @Narann Yes, it would, I guess, but as the question is written, it asks about making the `component` field optional. Using `Option` would make the `entity_id` field optional, which is not the same. As I've described, it's not possible to make the `Position` type optional without increasing the size (or decreasing the size of something else, as in my answer). – Frxstrem Apr 05 '20 at 20:01
  • Thanks, unfortunately, the question has been edited and changed the hint. Original question was: “Can Option type removes the need of a boolean field?” – Narann Apr 05 '20 at 20:19
-2

As suggested in the comments, an alternative would be to use Option with NonZeroU32 for entity_id and rely on Some and None to check entity is used or not.

struct PoolItem {
    entity_id: Option<core::num::NonZeroU32>, // 4 bytes
    component: Position, // 12 bytes
}

fn main() {
    assert_eq!(size_of::<u32>(), 4);
    assert_eq!(size_of::<Position>(), 12);
    assert_eq!(size_of::<PoolItem>(), 16);
}

It makes entity ids starting from 1.

Playground

Narann
  • 819
  • 2
  • 8
  • 20
  • 3
    This isn't the same as what the OP was asking about. They have an `entity_id` and optional `component`; you have a `component` and optional `entity_id`. It's not possible to use the niche in `entity_id` to encode the presence or absence of a `Position`. Frxstrem's answer shows how you can restrict the range of `entity_id` to do this correctly (but you need to use a whole bit of `entity_id`, not just the one niche value). – trent Apr 05 '20 at 16:15
  • I'm OP, and my original question has been edited. Original question was: “Can Option type removes the need of a boolean field?”. But you're technically right. Using `Option` implies `entity_id` is `NonZeroU32` almost everywhere else in the code base. – Narann Apr 06 '20 at 10:19
  • 1
    Yes, I edited it, and I don't see how the edit has anything to do with this answer. The question, both before and after the edits, asks how to make a struct with an ID and optional `Position`; this answer shows how to make a struct with a `Position` and optional ID. You cannot use this struct to encode the value `PoolItem { entity_id: 704, component: None }`, for example, as you can with `Option`. Making `entity_id` optional is not isomorphic to making `Position` optional. – trent Apr 06 '20 at 15:44