Performance difference between bitpacking bytes into a u32 vs storing them in a vec?

Question

Intro:

I'm curious about the performance difference (both cpu and memory usage) of storing small numbers as bitpacked unsigned integers versus vectors of bytes

Example

I'll use the example of storing RGBA values. They're 4 Bytes so it is very tempting to store them as a u32.
However, it would be more readable to store them as a vector of type u8.

As a more detailed example, say I want to store and retrieve the color rgba(255,0,0,255)

This is how I would go about doing the two methods:

// Bitpacked:
let i: u32 = 4278190335;
//binary is 11111111 00000000 00000000 11111111
//In reality I would most likely do something more similar to:
let i: u32 = 255 << 24 + 255; //i think this syntax is right

// Vector:
let v: Vec<u8> = [255,0,0,255];

Then the two red values could be queried with

i >> 24 
//or
&v[0]
//both expressions evaluate to 255 (i think. I'm really new to rust <3 )

Question 1

As far as I know, the values of v must be stored on the heap and so there are the performance costs that are associated with that. Are these costs significant enough to make bit packing worth it?

Question 2

Then there's the two expressions i >> 24 and &v[0]. I don't know how fast rust is at bit shifting versus getting values off the heap. I'd test it but I won't have access to a machine with rust installed for a while. Are there any immediate insights someone could give on the drawbacks of these two operations?

Question 3

Finally, is the difference in memory usage as simple as just storing 32 bits on the stack for the u32 versus storing 64 bits on the stack for the pointer v as well as 32 bits on the heap for the values of v?

Sorry if this question is a bit confusing

Rust does have arrays as well as `Vec`. The size must be a constant but that would be fine here. This would save you the overhead of heap allocation, etc. — Nate Eldredge, Jun 02 '21 at 03:57
Even if you don't have a machine to test, there is Rust support on the [godbolt compiler explorer](https://godbolt.org/), so you could see what kind of code it generates and what optimizations are done. But is it really that hard to install it yourself? — Nate Eldredge, Jun 02 '21 at 04:01
Welp, that answers most of my question. I didn't realize array values were stored on the stack! I'm still curious about the difference between bit shifting versus getting an array index though. As well as if there's any under-the-hood costs associated with either — Jam, Jun 02 '21 at 04:02
@NateEldredge i just won't have internet access save my phone data for a week or so — Jam, Jun 02 '21 at 04:02
"*I'm still curious about the difference between bit shifting versus getting an array index though*"—depends on hardware. For most CPUs, if a value is not aligned to its word size, it will have to undertake bit masking & shifting on memory access anyway. So if you're on a 64-bit CPU, accessing the elements of a `[u8]` will perform all that shifting under the hood anyway. The difference is merely the interface that you, the programmer, use—and you should choose that which most naturally fits the problem you're modelling, for ease of understanding and maintenance. — eggyal, Jun 02 '21 at 06:37

score 5 · Accepted Answer · answered Jun 02 '21 at 04:21

Using a Vec will be more expensive; as you mentioned, it will need to perform heap allocations, and access will be bounds-checked as well.

That said, if you use an array [u8; 4] instead, the performance compared with a bitpacked u32 representation should be almost identical.

In fact, consider the following simple example:

pub fn get_red_bitpacked(i: u32) -> u8 {
    (i >> 24) as u8
}

pub fn get_red_array(v: [u8; 4]) -> u8 {
    v[3]
}

pub fn test_bits(colour: u8) -> u8 {
    let colour = colour as u32;
    let i = (colour << 24) + colour;
    get_red_bitpacked(i)
}

pub fn test_arr(colour: u8) -> u8 {
    let v = [colour, 0, 0, colour];
    get_red_array(v)
}

I took a look on Compiler Explorer, and the compiler decided that get_red_bitpacked and get_red_array were completely identical: so much so it didn't even bother generating code for the former. The two "test" functions obviously optimised to the exact same assembly as well.

example::get_red_array:
        mov     eax, edi
        shr     eax, 24
        ret

example::test_bits:
        mov     eax, edi
        ret

example::test_arr:
        mov     eax, edi
        ret

Obviously this example was seen through by the compiler: for a proper comparison you should benchmark with actual code. That said, I feel fairly safe in saying that with Rust the performance of u32 versus [u8; 4] for these kinds of operations should be identical in general.

Interesting! I often forget that there are many optimizations done in the compilation step — Jam, Jun 02 '21 at 08:20

score 0 · Answer 2 · answered Jun 02 '21 at 04:47

tl;dr use a struct:

struct Color {
    r: u8,
    g: u8,
    b: u8,
    a: u8,
}

Maybe use repr(packed) as well.

It gives you the best of all worlds and you can give the channels their name.

Are these costs significant enough to make bit packing worth it?

Heap allocation has a huge cost.

Are there any immediate insights someone could give on the drawbacks of these two operations?

Both are noise compared to allocating memory.