Pack four bytes in a float

Question

I'm writing a shader (HLSL), and I need to pack a color value into the R32 format. I've found various pieces of code for packing a float into the R8G8B8A8 format, but none of them seem to work in reverse. I'm targeting SM3.0, so (afaik) bit operations are not an option.

To sum it up, I need to be able to do this:

float4 color = ...; // Where color ranges from 0 -> 1
float packedValue = pack(color);

Anyone know how to do this?

UPDATE
I've gotten some headway... perhaps this will help to clarify the question.
My temporary solution is as such:

const int PRECISION = 64;

float4 unpack(float value)
{   
    float4 color;

    color.a = value % PRECISION;
    value = floor(value / PRECISION);

    color.b = value % PRECISION;
    value = floor(value / PRECISION);

    color.g = value % PRECISION;
    value = floor(value / PRECISION);

    color.r = value;

    return color / (PRECISION - 1);
}

float pack(float4 color)
{   
    int4 iVal = floor(color * (PRECISION - 1));

    float output = 0;

    output += iVal.r * PRECISION * PRECISION * PRECISION;
    output += iVal.g * PRECISION * PRECISION;
    output += iVal.b * PRECISION;
    output += iVal.a;

    return output;
}

I'm basically... pretending I'm using integer types :s
Through guess and check, 64 was the highest number I could use while still maintaining a [0...1] range. Unfortunately, that also means I'm losing some precision - 6 bits instead of 8.

It's going to be asked eventually, so you might as well get it over with :) why do you have to use R32 instead of an integer format? — eodabash, Jan 27 '11 at 01:10
Heh... I knew it was coming, but still... In actuality, I'm using R32G32B32A32, I only said R32 to simplify. I'm just trying to figure out how I can shove as much information as possible into a single render target. Of course, I still need to do some measurements, but I imagine shoving all my data into a single render target is a little cheaper than using four via MRT. — YellPika, Jan 27 '11 at 03:04
I'm looking forward to seeing what answers people can come up with for this question. — Olhovsky, Jan 29 '11 at 19:11
I know I'm not helping to answer your question, but I'm skeptical that there's anything to be gained by doing all this work to pack things into one big render target rather than just using MRT. Not to mention that some older hardware (like XBox 360) doesn't even support R32G32B32A32. — eodabash, Jan 31 '11 at 20:11

Olhovsky · Answer 1 · 2011-01-29T19:11:10.047

1

Have a look at: http://diaryofagraphicsprogrammer.blogspot.com/2009/10/bitmasks-packing-data-into-fp-render.html

The short answer is that it's not possible to do a lossless packing of 4 floats into 1 float.

Even if you do find a way to pack 4 floats, storing their exponent and significand, the packing and unpacking process may be prohibitively expensive.

edited Jan 29 '11 at 19:11

answered Jan 29 '11 at 19:03

Olhovsky

5,466
3
36
47

2

Well, in reality (and as the title says) I'm trying to pack four bytes, not four floats. I have seen that post before though. Unfortunately the code uses bit operators - not going to happen using SM3.0... :( – YellPika Jan 29 '11 at 19:36
You can replace bitshifts with multiplies. E.g. `x << 16` is the same thing as `x *= 2^16` – Olhovsky Jan 30 '11 at 18:56
Oh, although he does use bitmasks. And you can't use those in SM3.0. – Olhovsky Jan 30 '11 at 19:49
@YellPika, the situation is the same with 4 bytes. You will suffer precision loss when packing. – Olhovsky Jan 30 '11 at 19:50
I can't just do x * 2^n because I'm dealing with floating point types, and I don't think they'll work the same as integer types. Also, isn't the size of a float the same size as four bytes? – YellPika Jan 30 '11 at 20:21
The size is the same, but 8 bits are reserved for exponent, and 1 bit is reserved for sign. So performing a mapping from 4 bytes to float is hard/expensive. I'm not sure if you can do it without precision loss. I don't think so. – Olhovsky Jan 31 '11 at 06:13

score 1 · Answer 2 · answered Feb 05 '11 at 10:50

Do like your sample code for 3 components to pack them into into the s=significand, and do exp2(color.g)(1+s) to pack the other component into the exponent. You'll still lose precision, but it won't be as bad as if you try to pack everything into the significand like you seem to be doing.

Unfortunately there's no way to avoid losing precision, as there are a bunch of NaN and Inf float values that are both indistinguishable from each other and hard to work with (and possibly not even supported by your GPU, depending on how old it is).

Per · Answer 3 · 2022-07-22T08:46:10.800

You can't pack 4 bytes, but you can pack two numbers up to 2048. To encode and decode:

float encoded = number1 + number2 * 2048
int decoded1 = encoded % 2048
int decoded2 = floor(encoded / 2048)

A 32-bit float has 23 bits of integer precision (https://en.wikipedia.org/wiki/Single-precision_floating-point_format), allowing integers up to 8,388,608 to be reliably stored. Two numbers of 2,048 use 4,194,304 integers of that range.

You can technically store two numbers up to the square root of 8,388,608, which is 2,896. But 2048 is cleaner to look at, and likely doesn't make a difference.

This has been tested thoroughly in a shader that uses a 32-bit float RGBA to transport 8 separate 0 - 2,047 values into a shader. It's only once you exceed 8,388,608 that rounding errors start to appear, e.g. a stored component of 1,029 comes back as 1,028.

You can do the same for a half float, which has 11 bits of precision (2,048 total), allowing you to pack two values of up to 45 -- or in the real world, two values of 0 - 31.

Since sampling is the most expensive thing to do in a shader, if you have the choice between two half float textures and one full float texture, I would go with the full float texture and encode more information in it, because it's less sampling.

Also be sure that you use no interpolation in the sampler. In Unity this means using "point" instead of "linear" or "trilinear".

Pack four bytes in a float

3 Answers3

Linked