Division by and subsequent multiplication with the same floating point number

Question

The following code performs an empirical analysis of the behavior of division by and subsequent multiplication with the same floating point number.

When looking at the output of the program without rounding, the behavior seems somewhat arbitrary to me. I was expecting a less jumpy response. My question now is: Can anybody formulate some kind of rule that explains the results or provide some other kind of deep insight, maybe also for the case with rounding.

I am aware of this post, which is related to the special case of b = 1.

For simplicity, I suggest to restrict answers to IEEE-754 32-bit single precision float (and 32-bit unsigned int, which does not really matter here).

How I came across this topic: Assume having an 8-bit grayscale image (unsigned char instead of unsigned int in that case) and converting it to a 32-bit float image, thereby dividing by b = 255.0f in order to map {0, 1, ..., 255} to the interval [0, 1]. Now when converting back to {0, 1, ..., 255} by multiplying with 255.0f, will one encounter imprecisions? According to my empirical analysis the answer is no (see output without rounding below, 1023 is beyond 255), but e.g. for 252.0f the answer would be yes. Of course one can handle this by adding 0.5f in order to round when converting back to unsigned char - which should be done anyways for the general case, as usually one converts back a processed version of the floating point image and not an untouched one. Still, I started to wonder if adding 0.5f for rounding can be omitted for the untouched case.

#include <limits.h>
#include <stdio.h>

unsigned int foo(float b) {
    unsigned int a;

    for (a = 0; a < UINT_MAX; a++) {
        unsigned int a_ = (float) a / b * b; // without rounding
        //unsigned int a_ = (float) a / b * b + 0.5f; // with rounding

        if (a_ != a) {
            break;
        }
    }

    return a;
}

int main() {
    for (unsigned int b = 1; b < 256; b++) {
        printf("%u:%u, ", b, foo((float) b));
    }

    return 0;
}

Output without rounding:
1:16777217, 2:16777217, 3:12582913, 4:16777217, 5:10485763, 6:12582913, 7:31, 8:16777217, 9:9437189, 10:10485763, 11:13, 12:12582913, 13:53, 14:31, 15:63, 16:16777217, 17:8912905, 18:9437189, 19:27, 20:10485763, 21:57, 22:13, 23:7, 24:12582913, 25:53, 26:53, 27:29, 28:31, 29:15, 30:63, 31:997, 32:16777217, 33:8650769, 34:8912905, 35:103, 36:9437189, 37:3, 38:27, 39:107, 40:10485763, 41:1, 42:57, 43:15, 44:13, 45:113, 46:7, 47:1, 48:12582913, 49:27, 50:53, 51:119, 52:53, 53:15, 54:29, 55:1, 56:31, 57:63, 58:15, 59:31, 60:63, 61:1, 62:997, 63:255, 64:16777217, 65:8519713, 66:8650769, 67:55, 68:8912905, 69:15, 70:103, 71:13, 72:9437189, 73:45, 74:3, 75:49, 76:27, 77:29, 78:107, 79:3, 80:10485763, 81:11, 82:1, 83:1, 84:57, 85:217, 86:15, 87:27, 88:13, 89:27, 90:113, 91:223, 92:7, 93:49, 94:1, 95:15, 96:12582913, 97:1, 98:27, 99:7, 100:53, 101:115, 102:119, 103:29, 104:53, 105:237, 106:15, 107:1, 108:29, 109:1, 110:1, 111:15, 112:31, 113:29, 114:63, 115:1, 116:15, 117:249, 118:31, 119:251, 120:63, 121:1, 122:1, 123:1, 124:997, 125:251, 126:255, 127:2037, 128:16777217, 129:8454209, 130:8519713, 131:21, 132:8650769, 133:5, 134:55, 135:7, 136:8912905, 137:47, 138:15, 139:9, 140:103, 141:1, 142:13, 143:41, 144:9437189, 145:73, 146:45, 147:11, 148:3, 149:3, 150:49, 151:31, 152:27, 153:413, 154:29, 155:23, 156:107, 157:161, 158:3, 159:1, 160:10485763, 161:41, 162:11, 163:31, 164:1, 165:1, 166:1, 167:45, 168:57, 169:27, 170:217, 171:23, 172:15, 173:45, 174:27, 175:29, 176:13, 177:7, 178:27, 179:25, 180:113, 181:187, 182:223, 183:3, 184:7, 185:15, 186:49, 187:59, 188:1, 189:1, 190:15, 191:3, 192:12582913, 193:13, 194:1, 195:455, 196:27, 197:13, 198:7, 199:57, 200:53, 201:7, 202:115, 203:27, 204:119, 205:15, 206:29, 207:7, 208:53, 209:1, 210:237, 211:53, 212:15, 213:7, 214:1, 215:59, 216:29, 217:59, 218:1, 219:1, 220:1, 221:481, 222:15, 223:15, 224:31, 225:61, 226:29, 227:1, 228:63, 229:29, 230:1, 231:29, 232:15, 233:121, 234:249, 235:1, 236:31, 237:1, 238:251, 239:121, 240:63, 241:501, 242:1, 243:1, 244:1, 245:1, 246:1, 247:989, 248:997, 249:1, 250:251, 251:253, 252:255, 253:507, 254:2037, 255:1023

Output with rounding:
1:8388609, 2:8388609, 3:8388609, 4:8388609, 5:8388609, 6:8388609, 7:7340036, 8:8388609, 9:8388609, 10:8388609, 11:5767178, 12:8388609, 13:6815751, 14:7340036, 15:7864328, 16:8388609, 17:8388609, 18:8388609, 19:4980747, 20:8388609, 21:5505025, 22:5767178, 23:6029316, 24:8388609, 25:6553602, 26:6815751, 27:7077891, 28:7340036, 29:7602181, 30:7864328, 31:8126480, 32:8388609, 33:8388609, 34:8388609, 35:4587552, 36:8388609, 37:4849688, 38:4980747, 39:5111812, 40:8388609, 41:5373953, 42:5505025, 43:5636130, 44:5767178, 45:5898246, 46:6029316, 47:6160391, 48:8388609, 49:6422530, 50:6553602, 51:6684674, 52:6815751, 53:6946819, 54:7077891, 55:7208964, 56:7340036, 57:7471109, 58:7602181, 59:7733254, 60:7864328, 61:7995403, 62:8126480, 63:8257568, 64:8388609, 65:8388609, 66:8388609, 67:4390951, 68:8388609, 69:4522008, 70:4587552, 71:4653094, 72:8388609, 73:4784130, 74:4849688, 75:4915205, 76:4980747, 77:5046294, 78:5111812, 79:5177353, 80:8388609, 81:5308417, 82:5373953, 83:5439489, 84:5505025, 85:5570561, 86:5636130, 87:5701646, 88:5767178, 89:5832710, 90:5898246, 91:5963780, 92:6029316, 93:6094852, 94:6160391, 95:6225933, 96:8388609, 97:6356994, 98:6422530, 99:6488066, 100:6553602, 101:6619138, 102:6684674, 103:6750219, 104:6815751, 105:6881283, 106:6946819, 107:7012355, 108:7077891, 109:7143427, 110:7208964, 111:7274500, 112:7340036, 113:7405572, 114:7471109, 115:7536645, 116:7602181, 117:7667718, 118:7733254, 119:7798791, 120:7864328, 121:7929865, 122:7995403, 123:8060941, 124:8126480, 125:8192021, 126:8257568, 127:8323136, 128:8388609, 129:8388609, 130:8388609, 131:4292728, 132:8388609, 133:4358217, 134:4390951, 135:4423704, 136:8388609, 137:4489235, 138:4522008, 139:4554755, 140:4587552, 141:4620296, 142:4653094, 143:4685831, 144:8388609, 145:4751362, 146:4784130, 147:4816921, 148:4849688, 149:4882455, 150:4915205, 151:4947976, 152:4980747, 153:5013530, 154:5046294, 155:5079047, 156:5111812, 157:5144580, 158:5177353, 159:5210126, 160:8388609, 161:5275649, 162:5308417, 163:5341185, 164:5373953, 165:5406721, 166:5439489, 167:5472257, 168:5505025, 169:5537793, 170:5570561, 171:5603458, 172:5636130, 173:5668884, 174:5701646, 175:5734410, 176:5767178, 177:5799944, 178:5832710, 179:5865478, 180:5898246, 181:5931019, 182:5963780, 183:5996548, 184:6029316, 185:6062084, 186:6094852, 187:6127623, 188:6160391, 189:6193162, 190:6225933, 191:6258713, 192:8388609, 193:6324226, 194:6356994, 195:6389762, 196:6422530, 197:6455298, 198:6488066, 199:6520834, 200:6553602, 201:6586370, 202:6619138, 203:6651906, 204:6684674, 205:6717495, 206:6750219, 207:6782983, 208:6815751, 209:6848515, 210:6881283, 211:6914051, 212:6946819, 213:6979587, 214:7012355, 215:7045123, 216:7077891, 217:7110659, 218:7143427, 219:7176195, 220:7208964, 221:7241732, 222:7274500, 223:7307268, 224:7340036, 225:7372804, 226:7405572, 227:7438340, 228:7471109, 229:7503877, 230:7536645, 231:7569413, 232:7602181, 233:7634950, 234:7667718, 235:7700486, 236:7733254, 237:7766023, 238:7798791, 239:7831560, 240:7864328, 241:7897097, 242:7929865, 243:7962634, 244:7995403, 245:8028172, 246:8060941, 247:8093710, 248:8126480, 249:8159250, 250:8192021, 251:8224794, 252:8257568, 253:8290347, 254:8323136, 255:8355968

So basically your code finds the first value at which there is error. With rounding the number is much higher. Not sure what you mean by untouched case. — Rishikesh Raje, Nov 30 '17 at 12:44
It would be helpful if your output contained the values `b` too. — Bathsheba, Nov 30 '17 at 12:45
@RishikeshRaje "untouched" is related to the image processing example that I mentionned: converting an 8-bit grayscale image to floating point, not processing the resulting floating point image further thus leaving it untouched, and then converting it back to 8-bit. — Pedro, Nov 30 '17 at 12:48
@Bathsheba I had thought about this, but decided to leave out the values `b` out in favor of readability and compactness of the post. If one wants to look at this topic further, I would suggest to compile the code and add the value `b` and a line break to the printf. — Pedro, Nov 30 '17 at 12:55
@PedramAzad: Or folk willing to answer questions for free will answer a question from someone who understands that much better. — Bathsheba, Nov 30 '17 at 12:56
Too broad for me sadly. The patterns are obvious once you've read lots of articles on floating point. Note that 16777217 is the first integer that can't be represented exactly in a `float`. That accounts for `b` being 1, 2, 4, 8, 16, 32, ... — Bathsheba, Nov 30 '17 at 13:08
And this site might help you with your research: http://www.exploringbinary.com/floating-point-converter/ — Bathsheba, Nov 30 '17 at 13:11
I was aware of 2^24+1=16777217. I was mainly curious about an explanation for the jumps, e.g. 63:255, 64:16777217. Now, looking at `b` when being a power of 2 brings some nice insight, thank you. — Pedro, Nov 30 '17 at 13:17
Offhand: `unsigned int a_ = (float) a / b * b` produces `a` iff `(float) a / b * b` is greater than or equal to `a`. This occurs in two ways. (1) If `(float a) / b` rounds up or not at all, then `(float) a / b * b` is greater than or equal to a. (2) If `(float a) / b)` rounds down but the subsequent multiplication by `b` rounds up (by enough to restore `a`, or is that always true if round up occurs here?). In either case, the decision about which way to round depends on the bits being rounded away. These are “sort of random”—they are happenstance of the representations of fractions in binary… — Eric Postpischil, Nov 30 '17 at 13:21
… by which I mean I doubt there is any meaningful pattern in them. You might make some number-theoretic analysis of them, but it may not be particularly illuminating. — Eric Postpischil, Nov 30 '17 at 13:22
@EricPostpischil: Is it really "iff"? Consider `(unsigned int) ((float) 33554438 / 1.0f * 1.0f) = 33554440'. — Pedro, Nov 30 '17 at 13:51
@PedramAzad: You cannot divide 33554438 by 1 in IEEE-754 32-bit binary because 33554438 does not exist in IEEE-754 32-bit binary. Although I wrote `(float) a / b * b` above, I was just mirroring the notation in the question because it would have been cumbersome to describe the changes necessary to discuss floating-point correctly, and because comments are limited in formatting and length. My intent was to discuss only the floating-point division and multiplication. For any `unsigned int` value not representable in `float`, the result necessary differs. — Eric Postpischil, Nov 30 '17 at 14:55
Please post the output of `#include ... printf("%FEM: %d\n"<, FLT_EVAL_METHOD);` for your platform. it is very useful in discussing this type of question. — chux - Reinstate Monica, Nov 30 '17 at 22:20

chux - Reinstate Monica · Answer 1 · 2017-11-30T23:55:11.900

Still, I started to wonder if adding 0.5f for rounding can be omitted for the untouched case.

To address part of OP's question with b==255. (OP's original problem)

It makes sense that gray-scale conversion would use a fixed/constant divider and a fixed multiplier. Instead of using the same exact values for both, use slightly different ones. Although this is not "same floating point number", I think it meets OP's original goal.

Code can get the expected matching a == a_ up to 8,355,967 with no + 0.5f.

float b1 = 255.0f;
float b2 = nextafterf(255.0f, 256.0f);
unsigned int a_ = (unsigned int)  ((float) a / b1 * b2);

For 1 <= b <= 255, the lowest a is 4,227,104 at b == 129/

Even though this is not the answer to my original question, I upvoted it, because it definitely is an interesting and useful idea, which I will keep in mind. — Pedro, Dec 01 '17 at 09:37

Division by and subsequent multiplication with the same floating point number

1 Answers1