For example lets say 18.xxx is read in as the input of a function as a float value. It will be truncated down to 18.0. From then, encoded it to: 0 10011 0010000 which satisfies the 13 bit float desired, and will be returned as an int with decimal value 2448. Anyone know how this can be accomplished using shifts?
Asked
Active
Viewed 432 times
0
-
2Well, I suppose you might reinterpet the bit representation and extract the top 13-bits assuming a 32-bit IEEE-754 representation, perhaps along with a bit of rounding added in. Doing so wastes a full 8-bits on the exponent range, leaving only an effective 5-bit precision. With such a limited space I would suggest to considering a specialized representation tailored to your data set, perhaps straight fixed-point data. – doynax Sep 29 '16 at 06:00
1 Answers
1
This might do what you want if your floating point number is represented in 32-bit IEEE 754 single-precision binary format with an unsigned exponent:
#include <stdio.h>
#include <string.h>
#include <assert.h>
unsigned short float32to13(float f) {
assert(sizeof(float) == sizeof(unsigned int));
unsigned int g;
memcpy(&g, &f, sizeof(float)); // allow us to examine a float bit by bit
unsigned int sign32 = (g >> 0x1f) & 0x1; // one bit sign
unsigned int exponent32 = ((g >> 0x17) & 0xff) - 0x7f; // unbias 8 bits of exponent
unsigned int fraction32 = g & 0x7fffff; // 23 bits of significand
assert(((exponent32 + 0xf) & ~ 0x1f) == 0); // don't overflow smaller exponent
unsigned short sign13 = sign32;
unsigned short exponent13 = exponent32 + 0xf; // rebias exponent by smaller amount
unsigned short fraction13 = fraction32 >> 0x10; // drop lower 16 bits of significand precision
return sign13 << 0xc | exponent13 << 0x7 | fraction13; // assemble a float13
}
int main() {
float f = 18.0;
printf("%u\n", float32to13(f));
return 0;
}
OUTPUT
> ./a.out
2448
>
I leave any endian issues and additional error checking to the end user. This example is provided only to demonstrate to the OP the types of shifts necessary to convert between floating point formats. Any resemblance to actual floating point formats is purely coincidental.

cdlane
- 40,441
- 5
- 32
- 81
-
Undefined behavior. You break the strict aliasing rule. The rest of the code works only on specific platforms. – 2501 Sep 29 '16 at 07:25
-
@2501, I've revised the code to address the strict aliasing rule issue per [this post on float bits and strict aliasing](http://stackoverflow.com/questions/4328342/float-bits-and-strict-aliasing) – cdlane Sep 29 '16 at 07:36
-
Great! The code still makes assumptions, specifically about implementation of real types. Please address them. – 2501 Sep 29 '16 at 07:39
-
1I can see what you've done here @cdlane , looks great, I will have to tweak it and implement it slightly different due to my specifications for this. Thanks a lot! – Mau Sep 29 '16 at 20:13