32

What would be the correct way of converting color value from float to byte? At first I thought b=f*255.0 should do it, but now I'm thinking, that in this case only the exact 1.0 will be converted to 255, but 0.9999 will already be 254 which is probably not what I want...

It seems that b=f*256.0 would be better except that it would have an unwanted case of making 256 in the case of exact 1.0.

In the end I'm using this:

#define F2B(f) ((f) >= 1.0 ? 255 : (int)((f)*256.0))
inkredibl
  • 1,918
  • 1
  • 14
  • 19
  • BTW, `0.9999` is extremely close to `1.0`, and should definitely be converted to `255`. Any solution that fails to do so would be wrong. – ToolmakerSteve Oct 04 '17 at 23:57
  • 2
    NOTE: Having thoroughly analyzed the math, I've made an in-depth case that [round(f * 255.0 is the optimal solution](https://stackoverflow.com/a/46575472/199364) - despite all the answers that are based on `* 256` or `*255.999`. (Though in practice, its usually not significant - the accepted answer's formula is fine. Its also fine to substitute `255.999` for `256` in that answer. My analysis shows that neither of those is optimal - any change from the optimal formula increases the error for some values - but the error increase is minor.) – ToolmakerSteve Oct 07 '17 at 00:57
  • I have **summarized** the benefits and drawbacks of the top 3 methods [in this answer](https://stackoverflow.com/a/66862750/365102). – Mateen Ulhaq Mar 29 '21 at 23:07
  • If `(256.0 * f)` yields: `(0.9)`, I would think that would be best represented by `{1}` if you are trying to represent the **nearest** color - not `{0}`. For this application, `round(255.0 * x)` seems better - even though the uniformity of the alternative is more appealing at first. – Brett Hale Oct 19 '22 at 12:22

10 Answers10

39

1.0 is the only case that can go wrong, so handle that case separately:

b = floor(f >= 1.0 ? 255 : f * 256.0)

Also, it might be worth forcing that f really is 0<=f<=1 to avoid incorrect behaviour due to rounding errors (eg. f=1.0000001).

f2 = max(0.0, min(1.0, f))
b = floor(f2 == 1.0 ? 255 : f2 * 256.0)

Alternative safe solutions:

b = (f >= 1.0 ? 255 : (f <= 0.0 ? 0 : (int)floor(f * 256.0)))

or

b = max(0, min(255, (int)floor(f * 256.0)))
ToolmakerSteve
  • 18,547
  • 14
  • 94
  • 196
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 2
    The only thing that concerns me is that now 255 suddenly has a subtly higher range than all the others ;). – inkredibl Dec 16 '09 at 11:55
  • 9
    The "problem" comes from having a closed interval instead of a half-closed interval. There's no way to fix this without having one interval slightly larger than the others. Console yourself by knowing that the distribution of floats in the interval [0,1] is not uniform (they are more densely packed near zero) so there's no guarantee that the other intervals are the same size either. – Mark Byers Dec 16 '09 at 12:09
  • 3
    255 should cover values from 0.99609375 included to 1.0 excluded. This answer suggests to include 1.0 to the interval. Indeed this is very subtle. For me this is the best possible answer. – mouviciel Dec 16 '09 at 12:10
  • 1
    To further clarify: if you start with a closed interval and split it into (say) two intervals, the only thing you can do is to make one half-closed interval and one closed interval. There's no way around the fact that one interval is "slightly larger" than the other. It's best not to worry about it. :) – Mark Byers Dec 16 '09 at 12:14
  • Note: I updated my comment to explicitly (rather than implicitly) call the floor function, in case it is unclear. – Mark Byers Dec 16 '09 at 12:24
  • Why wouldn't you just do `round(f * 255)`? No testing or clamping needed, unless `f` is significantly beyond 1.0. [See my answer](https://stackoverflow.com/a/46575472/199364) for discussion. – ToolmakerSteve Oct 04 '17 at 23:01
  • Optimization: You can avoid the call to `floor` if (1) you ensure the input to `floor` is always positive and (2) you're casting to an integer value. For example: `b = (f >= 1.0 ? 255 : (f <= 0.0 ? 0 : (int)floor(f * 256.0)))` can be rewritten as `b = f >= 1.0 ? 255 : f <= 0.0 ? 0 : (int)(f * 256.0)`. Note this does not work for `b = max(0, min(255, (int)floor(f * 256.0)))` because the result of `floor` is clamped, not the input. – pauln Aug 30 '18 at 14:30
23

I've always done round(f * 255.0).

There is no need for the testing (special case for 1) and/or clamping in other answers. Whether this is a desirable answer for your purposes depends on whether your goal is to match input values as closely as possible [my formula], or to divide each component into 256 equal intervals [other formulas].

The possible downside of my formula is that the 0 and 255 intervals only have half the width of the other intervals. Over years of usage, I have yet to see any visual evidence that that is bad. On the contrary, I've found it preferable to not hit either extreme until the input is quite close to it - but that is a matter of taste.

The possible upside is that [I believe] the relative values of R-G-B components are (slightly) more accurate, for a wider range of input values.
Though I haven't tried to prove this, that is my intuitive sense, given that for each component I round to get the closest available integer. (E.g. I believe that if a color has G ~= 2 x R, this formula will more often stay close to that ratio; though the difference is quite small, and there are many other colors that the 256 formula does better on. So it may be a wash.)

In practice, either 256 or 255-based approaches seem to provide good results.


Another way to evaluate 255 vs 256, is to examine the other direction -
converting from 0..255 byte to 0.0..1.0 float.

The formula that converts 0..255 integer values to equally spaced values in range 0.0..1.0 is:

f = b / 255.0

Going in this direction, there is no question as to whether to use 255 or 256: the above formula is the formula that yields equally spaced results. Observe that it uses 255.

To understand the relationship between the 255 formulas in the two directions, consider this diagram, if you only had 2 bits, hence values integer values 0..3:

Diagram using 3 for two bits, analogous to 255 for 8 bits. Conversion can be from top to bottom, or from bottom to top:

0 --|-- 1 --|-- 2 --|-- 3  
0 --|--1/3--|--2/3--|-- 1
   1/6     1/2     5/6

The | are the boundaries between the 4 ranges. Observe that in the interior, the float values and the integer values are at the midpoints of their ranges. Observe that the spacing between all values is constant in both representations.

If you grasp these diagrams, you will understand why I favor 255-based formulas over 256-based formulas.


Claim: If you use / 255.0 when going from byte to float, but you don't use round(f * 255.0) when going to byte from float, then the "average round-trip" error is increased. Details follow.

This is most easily measured by starting from float, going to byte, then back to float. For a simple analysis, use the 2-bit "0..3" diagrams.

Start with a large number of float values, evenly spaced from 0.0 to 1.0. THe round-trip will group all these values at the 4 values.
The diagram has 6 half-interval-length ranges:
0..1/6, 1/6..1/3, .., 5/6..1
For each range, the average round-trip error is half the range, so 1/12 (Minimum error is zero, maximum error is 1/6, evenly distributed).
All the ranges give that same error; 1/12 is the overall average error when round trip.

If you instead use any of the * 256 or * 255.999 formulas, most of the round-trip results are the same, but a few are moved to the adjacent range.
Any change to another range increases the error; for example if the error for a single float input previously was slightly less than 1/6, returning the center of an adjacent range results in an error slightly more than 1/6. E.g. 0.18 in optimal formula => byte 1 => float 1/3 ~= 0.333, for error |0.33-0.18| = 0.147; using a 256 formula => byte 0 => float 0 , for error 0.18, which is an increase from the optimal error 0.147.

Diagrams using * 4 with / 3. Conversion is from one line to the next.
Notice the uneven spacing of the first line: 0..3/8, 3/8..5/8, 5/8..1. Those distances are 3/8, 2/8, 3/8. Notice the interval boundaries of last line are different than first line.

   0------|--3/8--|--5/8--|------1
         1/4     1/2     3/4
=> 0------|-- 1 --|-- 2 --|------3  

=> 0----|---1/3---|---2/3---|----1
       1/6       1/2       5/6

The only way to avoid this increased error, is to use some different formula when going from byte to float. If you strongly believe in one of the 256 formulas, then I'll leave it to you to determine the optimal inverse formula.
(Per byte value, it should return the midpoint of the float values which became that byte value. Except 0 to 0, and 3 to 1. Or perhaps 0 to 1/8, 3 to 7/8! In the diagram above, it should take you from middle line back to top line.)

But now you will have the difficult-to-defend situation that you have taken equally-spaced byte values, and converted them to non-equally-spaced float values.

Those are your options if you use any value other than exactly 255, for integers 0..255: Either an increase in average round-trip error, or non-uniformly-spaced values in the float domain.

Newtonx
  • 3,555
  • 2
  • 22
  • 13
ToolmakerSteve
  • 18,547
  • 14
  • 94
  • 196
  • Since you aim for values which are centered in their range, did you also consider the compromise by mapping [0, 1] to [-0.5, 255.5] (multiplication by 256 before substracting 0.5) and then rounding? This would leave the values centered but remove the issue with the two smaller intervals. I wonder why this is not the prefered solution. – Camill Trüeb Aug 19 '22 at 16:59
  • Nice! You've identified a third option. I'd have to test "round-trip" errors to have an opinion on it. Also, I don't consider those shortened intervals an "issue". See my comment *"On the contrary, I've found it preferable to not hit either extreme until the input is quite close to it - but that is a matter of taste."* Others might like your suggestion, so thanks! – ToolmakerSteve Aug 19 '22 at 18:36
  • Using `round(x)` or `floor(x + 0.5)` is intuitive for sampling; e.g., integer coordinates correspond to a pixel's centre. I'm not sure it's always desirable for *quantization* given the non-uniform interval mapping to values: {0}, {255}. OTOH, if `(255.0 * x) => (0.75)`, I would prefer it map to {1} rather than {0} for something like color. – Brett Hale Oct 19 '22 at 12:03
8

Why not try something like

b=f*255.999

Gets rid of the special case f==1 but 0.999 is still 255

Erich Kitzmueller
  • 36,381
  • 5
  • 80
  • 102
3

If you want to have exact equally sized chunks the following would be the best solution. It converts a range of [0,1] to [0,256[.

#include <cstdint>
#include <limits>

// Greatest double predecessor of 256:
constexpr double MAXCOLOR = 256.0 - std::numeric_limits<double>::epsilon() * 128;

inline uint32_t float_to_int_color(const double color){
  return static_cast<uint32_t>(color * MAXCOLOR);
}

EDIT: To clarify, why epsilon(1.0)*128 and not epsilon(1.0)*256.0 is used: The cpp standard specifies the machine epsilon as

the difference between 1.0 and the next value representable by the floating-point type T.

Because 256.0 is represented by a exponent of 8 and a mantissa of 1.0, the epsilon(256.0) is to big to retrieve the previous number which will have a exponent of 7. Example:

   0 10000000111 0000000000000000000000000000000000000000000000000000 256.0
 - 0 11110100110 0000000000000000000000000000000000000000000000000000 eps(256.0)
_____________________________________________________________________
 = 0 10000000110 1111111111111111111111111111111111111111111111111110

which should be:

_____________________________________________________________________
 = 0 10000000110 1111111111111111111111111111111111111111111111111111
Fabian Keßler
  • 563
  • 3
  • 12
  • That should be 256, not 265. – RalfFriedl Jul 01 '19 at 22:11
  • Not sure I understand why epsilon*128..? – inkredibl Jul 28 '19 at 19:51
  • 1
    @inkredibl it's to retrieve the smalest number which can theoretically represented by the same exponent of 256.0 in the IEEE754 format. After subtracting it from 256.0 we got the biggest floating point predecessor of 256.0 which can be represented. – Fabian Keßler Sep 18 '19 at 10:22
  • Any idea why not epsilon*256? I don’t seem to follow the math on this... – inkredibl Dec 04 '19 at 19:36
  • 1
    @inkredibl epsilo*256 would give you the smalest number, which is representable with the same exponent, see: https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon. Because 256 is exactly the 8th power of two the previous number would have the exponent of 7 not 8. Therefore epsilon(1)*2⁷ has to be used. – Fabian Keßler Mar 25 '20 at 14:58
  • 1
    This result can be verified as the same value given by: `std::nextafter(256.0, 0.0)` - it minimizes the (unavoidable) bias introduced by mapping a 'continuous' range to half-open intervals. +1 for eliminating 'arbitrary' epsilon values like `255.999`, etc., and making the result rigorous. In practice, I'd probably just use:`std::clamp` :) – Brett Hale Oct 19 '22 at 11:54
2

What do you mean by correct way of converting a color value from float to byte? Do you mean that if you choose uniform random real numbers from the range [0,1[ that they will uniquely distributed among the 256 bins from 0 to 255?

To make things easier we assume that instead of a float value we have a real number and instead of int we want to convert to a two bit integer, something like a uint_2 - a integer number representation that consists of exactly two bits. This would mean that our unit2_t can have the values 00b, 01b, 10b and 11b (the b denotes that we have here a binary number. This is also known as Intel convention). Then we have to come up with an idea which real number intervals should be mapped to which integer values. If you want to map [0,0.25[ to 0, [0.25,0.5[ to 1, [0.5,0.75[ to 2 and [0.75,1.0] to 3, the conversion can be done by b = std::floor(f * 4.0) (floor takes only the integer part of a number and ignores the fraction part). This does work for all numbers except f=1. A simple change to b = floor(f >= 1.0 ? 255 : f * 256.0) can fix this problem. This equation ensures that the intervals are equally spaced.

If you assume that our real value is given as a single-precision IEEE 754 floating-point number then there is a finite number of possible float representations within the interval [0,1]. You have to decided which representations of those real numbers belong to which integer representation. Then you can come up with some source code that converts your float number to an integer and check if it fits your mapping. Maybe int ig = int(255.99 * g); is right thing for you or maybe b = floor(f >= 1.0 ? 255 : f * 256.0). It depends on what real number representation you want to map to which integer number representation.

Take a look at the following program. It demonstrates that different conversions do different things:

#include <iostream>

constexpr int realToIntegerPeterShirley(const double value) {
    return int(255.99 * value);
}

#define F2B(f) ((f) >= 1.0 ? 255 : (int)((f)*256.0))
constexpr int realToIntegerInkredibl(const double value) {
    return F2B(value);
}

const int realToIntegerMarkByers(const double value) {
    return std::floor(value >= 1.0 ? 255 : value * 256.0);
}

constexpr int realToIntegerToolmakerSteve(const double value) {
    return std::round(value * 255.0);
}

constexpr int realToIntegerErichKitzmueller(const double value) {
    return value*255.999;
}

constexpr int realToInteger(const float value) {
    return realToIntegerInkredibl(value);
}

int main() {
    {
        double value = 0.906285;
        std::cout << realToIntegerMarkByers(value) << std::endl; // output '232'
        std::cout << realToIntegerPeterShirley(value) << std::endl; // output '231'
    }

    {
        double value = 0.18345;
        std::cout << realToIntegerInkredibl(value) << std::endl; // output '46'
        std::cout << realToIntegerToolmakerSteve(value) << std::endl; // output '47'
    }

    {
        double value = 0.761719;
        std::cout << realToIntegerVertexwahn(value) << std::endl; // output '195'
        std::cout << realToIntegerErichKitzmueller(value) << std::endl; // output '194'
    }
}

You can use this small testbed to make experiments:

int main() {
    std::mt19937_64 rng;
    // initialize the random number generator with time-dependent seed
    uint64_t timeSeed = std::chrono::high_resolution_clock::now().time_since_epoch().count();
    std::seed_seq ss{uint32_t(timeSeed & 0xffffffff), uint32_t(timeSeed>>32)};
    rng.seed(ss);
    // initialize a uniform distribution between 0 and 1
    std::uniform_real_distribution<double> unif(0, 1);
    // ready to generate random numbers
    const int nSimulations = 1000000000;
    for (int i = 0; i < nSimulations; i++)
    {
        double currentRandomNumber = unif(rng);

        int firstProposal = realToIntegerMarkByers(currentRandomNumber);
        int secondProposal = realToIntegerErichKitzmueller(currentRandomNumber);

        if(firstProposal != secondProposal) {
            std::cout << "Different conversion with real " << currentRandomNumber << std::endl;
            return -1;
        }
    }
}

At the end I would suggest not to convert from float to integer. Store your image as high dynamic range data and choose a tool (e.g. http://djv.sourceforge.net/) that converts your data to low dynamic range. Tone mapping is an own research area and there some tools that have a nice user interface an offer you all kinds of tone map operators.

Vertexwahn
  • 7,709
  • 6
  • 64
  • 90
1

The accepted solution failed when it compare float as it was integer.

This code work just fine:

float f;
uint8_t i;
//byte to float
f =CLAMP(((float)((i &0x0000ff))) /255.0, 0.0, 1.0);
//float to byte
i =((uint8_t)(255.0f *CLAMP(f, 0.0, 1.0)));

if you don't have CLAMP:

#define CLAMP(value, min, max) (((value) >(max)) ? (max) : (((value) <(min)) ? (min) : (value)))

Or for full RGB:

integer_color =((uint8_t)(255.0f *CLAMP(float_color.r, 0.0, 1.0)) <<16) |
               ((uint8_t)(255.0f *CLAMP(float_color.g, 0.0, 1.0)) <<8) |
               ((uint8_t)(255.0f *CLAMP(float_color.b, 0.0, 1.0))) & 0xffffff;

float_color.r =CLAMP(((float)((integer_color &0xff0000) >>16)) /255.0, 0.0, 1.0);
float_color.g =CLAMP(((float)((integer_color &0x00ff00) >>8)) /255.0, 0.0, 1.0);
float_color.b =CLAMP(((float)((integer_color &0x0000ff))) /255.0, 0.0, 1.0);
  • The one potential issue I see with accepted solution is it used an exact `==` instead of `>=`; I've submitted an edit to correct it (though the edit only matters if a float value gets slightly beyond 1.0.) it also includes a min/max "safe" version - are you alleging that is not safe? I have now added two simpler "safe" alternatives that are more obviously correct. – ToolmakerSteve Oct 05 '17 at 01:05
  • 1
    NOTE: your use of `255` makes your answer somewhat similar to [my newer answer](https://stackoverflow.com/a/46575472/199364). However, using `* 255` **without rounding**, as you do here, is definitely a mistake, in any language which **truncates** when converting to integer. Specifically, `(int)(0.999 * 255)` truncates to `254`, which is a poor choice: a value that close to 1.0 should become 255. In your formula it is almost impossible to get 255, unless the input is exactly 1.0. While I haven't tested it, I assume `(uint8_t)` has the same truncation characteristic. – ToolmakerSteve Oct 05 '17 at 01:14
1

Benefits and drawbacks of each method:

  1. (f * 256).clip(0, 255)
    • ✓ Uniformly sized intervals.
    • ✗ Does not correctly recover original image when a small noise term is added.
  2. (f * 255.999)
    • ✓ Uniformly sized intervals (with <0.0004% error).
    • ✗ Does not correctly recover original image when a small noise term is added.
    • ✓ Fastest.
  3. (f * 255).round()
    • ✗ Uniformly sizes intervals within the range [1, 254], but uses half the size of those intervals for the endpoints 0 and 255.
    • ✓ Correctly recovers original image.

Recommendations:

  • Use method 1 if f is a "random variable" or does not come from an unprocessed image.
  • Use method 2 if you want something fast and simple.
  • Use method 3 if you want to robustly recover the original image that f is generated from.

Testing

>>> x = np.arange(256)
[0, 1, 2, ..., 253, 254, 255]

>>> f = x / 255
[0.000, 0.004, 0.008, ..., 0.992, 0.996, 1.000]

>>> def test(func, eps=1e-3):
...     print(
...         (x == func(f - eps)).all(),
...         (x == func(f)).all(),
...         (x == func(f + eps)).all(),
...     )

We now test which method best recovers the original x values from f:

>>> test(lambda f: (f * 256).clip(0, 255).astype(np.uint8))
False True False

>>> test(lambda f: (f * 255.999).astype(np.uint8))
False True False

>>> test(lambda f: (f * 255).round().astype(np.uint8))
True True True
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
  • I don't think it is fair to say "correctly recovers original image" as it sounds very positive yet might do something very nasty depending on the situation, sure it resists change as much as possible but that's not necessarily a good thing if you're trying to change it. I also don't agree with the checkmarks next to things as depending on the situation they might be good or bad or make no difference at all. – inkredibl Apr 02 '21 at 08:47
  • "might do something very nasty depending on the situation" Can you give an example? I can't think of any offhand. When the data is generated from an image, method 3 is the only one which recovers the exact same RGB values. This is not a useful a property if the data is of an image that is processed or if the data is generated by sampling a distribution, but such data has no associated "original image" anyways. Particularly for data coming from a statistical distribution, I would prefer method 1 for its more faithful binning. – Mateen Ulhaq Apr 02 '21 at 08:55
  • First of all "correctly recovers" if and only if the function used to convert from byte to float was `byte/255.0` and in that case it resists small noise. Another thing is that it will map range (-0.5, 255.5) to (0, 255) which means 0 and 255 will only have half of the range of others. In case your goal is to convert image by the above function and then it resist change before converting back as much as possible, this might be a good solution, but that's not the only thing people are doing with colors. – inkredibl Apr 02 '21 at 09:17
0
public static void floatToByte(float f)
{
     return (byte)(f * 255 % 256)
}

Values < 1 are accurately converted.

Values that, after conversion, fall between 255 and 256 are floored to 255 when converted to a byte.

Values > 1 are looped back to 0 using the % operator.

jonsca
  • 10,218
  • 26
  • 54
  • 62
Tyler
  • 1
0

I believe the correct is floor(f*256), not round. This will map the interval 0..1 to exactly 256 zones of equal length.

[EDIT] and check 256 as a special case.

Pavel Radzivilovsky
  • 18,794
  • 5
  • 57
  • 67
  • floor(clamp(f, 0, 0.9999999)*256) – Martin Dec 16 '09 at 14:04
  • @Martin, better, though to be safe it is still necessary to clamp `f` to 1.0, if it is possible for there to be round-off error in f. So it doesn't necessarily simplify what needs to be done. Still, I like the suggestion - it does help if input values are known to be in valid range. – ToolmakerSteve Oct 05 '17 at 00:26
0

clamp(round(f * 256 - 0.5), 0, 255)

min(max(round(f * 256 - 0.5), 0), 255)

clamp(floor(f * 256), 0, 255) (rounding away from zero)

min(max(floor(f * 256), 0), 255) (rounding away from zero)

The formulas above convert from float [0..1] to float [-0.5..255.5] to byte [0..255]. But checking the Direct3D data conversion rules, they do differently. Something equivalent to:

floor(clamp(f, 0, 1) * 255 + 0.5) (rounding away from zero)

floor(min(max(f, 0), 1) * 255 + 0.5) (rounding away from zero)

This formula converts from float [-0.00196..1.00196] to float [-0.5..255.5] to byte [0..255].

A similar approach would be:

round(clamp(f, 0, 1) * 255)

round(min(max(f, 0), 1) * 255)

Wat
  • 61
  • 4