Understanding Unitys RGBA encoding in float (EncodeFloatRGBA)

Question

The built in Unity shaders supports a technique for encoding and decoding a 32-bit RGBA-value into a 32-bit float. This can be done by simply multiplying each channel with the highest possible value of the channel before it. Some loss of precision is expected since it is stored in a float.

The shader clearly has some optimization going for it that I am trying to understand.

The shader in UnityCG.cginc code looks like this:

// Encoding/decoding [0..1) floats into 8 bit/channel RGBA. Note that 1.0 will not be encoded properly.
inline float4 EncodeFloatRGBA( float v )
{
    float4 kEncodeMul = float4(1.0, 255.0, 65025.0, 16581375.0);
    float kEncodeBit = 1.0/255.0;
    float4 enc = kEncodeMul * v;
    enc = frac (enc);
    enc -= enc.yzww * kEncodeBit;
    return enc;
}
inline float DecodeFloatRGBA( float4 enc )
{
    float4 kDecodeDot = float4(1.0, 1/255.0, 1/65025.0, 1/16581375.0);
    return dot( enc, kDecodeDot );
}

So my questions:

Why is G-channel multiplied with 255 and not 256 (2^8=256), B-channel multiplied with 65025 and not 65536 (2^16=65536), and A-channel 16581375 and not 16777216 (2^24=16777216).
The dot product seems to multiply with fractions, so f = R + 255 * G + 65025 * B + 16581375 * A would not give compatible result. Why this choice?

as far as I understand 255 * 255 = 65025, it might give you clues on your first question but I don't understand the second question :D — Nika Kasradze, Jan 10 '17 at 10:14
Right, but there is 256 colors for each channel - not 255. :) — Tedd Hansen, Jan 10 '17 at 10:47
And as I said that I saw that its multiplied with 255 and not 256 too. :P — Tedd Hansen, Jan 10 '17 at 10:57
ok, [**stanlo**'s answer here](https://www.gamedev.net/topic/442138-packing-a-float-into-a-a8r8g8b8-texture-shader/?whichpage=1#2936108) should help you out. I guess this is because the biggest number you can interpret in 8 bits is 255 so they range each channel from 0 to 255 (256 shades total). Even unity's color picker tool is ranged [0, 255]. — Nika Kasradze, Jan 10 '17 at 11:27
but in the same thread at the bottom you can see calculations using 256's in the formula which works ok and totally messes up my point :D — Nika Kasradze, Jan 10 '17 at 11:31
For a simplified example: If you can store 10 colors then the range is 0-9. If you want to add another 10 colors the range for them would be 10-19. To store these separately combined in one integer you would have to do A + B * 10. For example 0+1*10=10. Not A + B * 9 where result would be 0+1*9=9 which overlaps with last color variation in first segment (9+0*9=9). — Tedd Hansen, Jan 10 '17 at 11:36
Relevant: [Why we always divide RGB values by 255?](http://stackoverflow.com/questions/20486700/why-we-always-divide-rgb-values-by-255) — MX D, Jan 10 '17 at 12:19
Same misconception. We store colors in 0-255, but there are 256 different colors. Added a small proof of concept https://www.shadertoy.com/view/4tGSW1 . I was expecting in overlapping colors at intesecting values, but it breaks down already when it sees 1.0f effective limiting (this example) a 256 color channel to 255 colors. But I guess that doesn't matter as long as the numbers for encode and decode match. - its just a fraction anyway. — Tedd Hansen, Jan 10 '17 at 13:03

score 6 · Accepted Answer · answered Jan 07 '18 at 15:17

From inspection, the Unity code looks like it wants to convert float values that are between 0.0 and 1.0 (not including the 1) into 4 float values that are between 0.0 and 1.0 such that those values can be converted into integer values from 0 to 255 by multiplying by 255.

But, dang, you are really correct to be skeptical about this code. It has many flaws (but usually produces results close enough to be mostly usable).

The reason why they multiply by 255 instead of 256 is because they have the erroneous belief that they can get reasonable results by keeping values as floats (and plan to convert the floats to 0-255 valued integers at a later time as others have mentioned in comments). But, then they use that frac() call. You need to recognize floating point code that looks like this as having bad code smell^TM.

Correct code would look something like this:

inline float4 EncodeFloatRGBA(float v)
{
    var vi = (uint)(v * (256.0f * 256.0f * 256.0f * 256.0f));
    var ex = (int)(vi / (256 * 256 * 256) % 256);
    var ey = (int)((vi / (256 * 256)) % 256);
    var ez = (int)((vi / (256)) % 256);
    var ew = (int)(vi % 256);
    var e = float4(ex / 255.0f, ey / 255.0f, ez / 255.0f, ew / 255.0f);
    return e;
}

and

inline float DecodeFloatRGBA(float4 enc) 
{
    var ex = (uint)(enc.x * 255);
    var ey = (uint)(enc.y * 255);
    var ez = (uint)(enc.z * 255);
    var ew = (uint)(enc.w * 255);
    var v = (ex << 24) + (ey << 16) + (ez << 8) + ew;
    return v / (256.0f * 256.0f * 256.0f * 256.0f);
}

The Unity code fails to accurately do a round trip about 23% of the time given random input (it fails about 90% of the time if you don't use extra processing like rounding the encoded values after multiplying by 255). The code above works 100% of the time.

Note that 32-bit floats only have 23 bits of precision so the 32-bit RGBA values will have leading or trailing 0 bits. The cases where you care to use the trailing bits when you have 0s at the start are probably few and far between so you could probably simplify the code to not use the ew values at all and encode as RGB instead of RGBA.

<rant>
All in all, I find the Unity code disturbing because it tries to reinvent something we already have. We have a nice IEEE 754 standard for encoding floats into 32-bit values and RGBA is usually at least 32-bits (the Unity code certainly assumes it is). I'm not sure why they don't just plop the float into the RGBA (you could still use an intermediate float4 as the code does below if you want). If you just put the float into the RGBA, you don't have to worry about the 23-bits of precision and you are not limited to values between 0.0 and 1.0. You can even encode infinities and NaNs. That code looks like:

inline float4 EncodeFloatRGBA(float v)
{
    byte[] eb = BitConverter.GetBytes(v);
    if (BitConverter.IsLittleEndian)
    {
        return float4(eb[3] / 255.0f, eb[2] / 255.0f, eb[1] / 255.0f, eb[0] / 255.0f);
    }

    return float4(eb[0] / 255.0f, eb[1] / 255.0f, eb[2] / 255.0f, eb[3] / 255.0f);
}

and

inline float DecodeFloatRGBA(float4 enc) 
{
    var eb = BitConverter.IsLittleEndian ?
        new[] { (byte)(enc.w * 255), (byte)(enc.z * 255),
                (byte)(enc.y * 255), (byte)(enc.x * 255) } :
        new[] { (byte)(enc.x * 255), (byte)(enc.y * 255),
                (byte)(enc.z * 255), (byte)(enc.w * 255) };
    return BitConverter.ToSingle(eb, 0);
}

</rant>

One reason not to drop a float directly into the 32 bits of an RGBA texture is that interpolated texture access will butcher your results. By treating the four bytes as base-256 digits, linear interpolation of the texture also correctly interpolates the represented floats. — Martin Ender, Feb 08 '19 at 16:25
Do those types you use actually exist in hlsl or cg? If not, then your rant is kinda pointless isn't it. Sorry for the necromancy. Just though someone reading this now should know. — Školstvo, Jul 28 '20 at 19:29
What would the equivalents of `byte` and `BitConverter.IsLittleEndian` be in hlsl? Can this translate directly to shader code? — pixelpax, Aug 18 '22 at 20:59

score 1 · Answer 2 · answered Jan 10 '23 at 11:07

The output of the shader are floats in [0..1] that are later converted to U8 in [0..255] by the GPU when storing in a RGBA8 buffer. That's where the *255 instead of *256 are coming from. Using 256 would be incorrect.

The line enc -= enc.yzww * kEncodeBit; might look weird, but it actually makes sense: this is to trim the lower bits and avoid rounding.

The dot product actually does properly rebuild the original value.

Understanding Unitys RGBA encoding in float (EncodeFloatRGBA)

2 Answers2