Convert 24bit Two's Complement to float_32t

Question

I have a quite specific question.
An ADC gives me 24bit datapoints in the twos complement. Usually I stored them into an 32bit int (twos complement) (by copying them starting from the MSB of the int and then shifting them 8 bits towards the LSB to maintain the leading one or zero)

Now I want to use the CMSIS-DSP Library on an ARM Processor to do a FFT Transformation. The FFT expects float32_t input. I never heard of the data format and can't find any specific sources about whether it has a fixed floating point or anything ...

Can anyone tell me what exactly float32_t is? Additionally any thoughts about converting the 24bit Two's complements into float32_t ?

I'll keep investigating an will Edit this post if I have anything new :-)

If someone is interested:
The ADC is the TI-ADS1299
The CMISI-DSP Library can be found here.
The link goes directly to the method I want to use (arm_rfft_f32 ()) . Since I'm just cable to use an older version of the library the method is already marked as deprecated.

Thanks & Greetings!

go to that documentation, **CLICK ON `float32_t` ** then you know what it is. and in the same file, there's a `void arm_q31_to_float (q31_t *pSrc, float32_t *pDst, uint32_t blockSize)`. — user3528438, May 26 '16 at 18:38
from fixed point to floating point you simply promote it to a floating point valueand divide/multiply it with a constant ratio whose value depends on what range you want the converted float to be in. — user3528438, May 26 '16 at 18:40
@Chuchaki I don't fully understand the question but it looks like the scope might be better suited for a different [stack exchange site](http://stackexchange.com/sites?view=list#traffic). Alternatively you might want to find better [tags](http://stackoverflow.com/tags) to increase the question's visibility. — 0x6C38, May 26 '16 at 18:48
Hey. Thanks for the fast answers. The documentation itselt just says "typedef float float32_t". No further information. I dont know details about the Q31 format. But "reverse engineering" the function and learning about Q31 would be my next idea. Since the timing is critical I want to use bitoperations and no typecasting or division. I would like to know where the "data" ends and where the mantisse starts. This way I could just "put" the bits on their place — Chuchaki, May 26 '16 at 18:49
@MrD: Basicly I'm looking for a way to do a type conversion between 24bit Twos Complement to float_32t using bit operations. — Chuchaki, May 26 '16 at 18:54
Er, what's wrong with `float32_t float_val = signed_int_val;`? Compiler emits the appropriate floating point conversion instruction, job done. I think you're overthinking this too much... — Notlikethat, May 26 '16 at 19:38

Notlikethat · Accepted Answer · 2016-05-26T21:01:11.383

Often the most obvious solution also turns out the best. If I had to sign-extend a 24-bit number and convert it to a floating-point type, I'd start by writing something like this:

// See Dric512's answer; I happen to know my compiler's ABI implements 
// 'float' with the appropriate IEEE 754 single-precision format
typedef float float32_t; 

float32_t conv_func(unsigned int int24) {
        return (int)(int24 << 8) >> 8;
}

Since you mention both CMSIS and critical timing, I'm going to safely assume your micro has a Cortex-M4 (or possibly Cortex-M7) with a hardware FPU - the words "performance" and "software floating-point FFT" go together pretty laughably - and that since it's the 21st century you're using a half-decent optimising compiler, so I compiled the above thusly:

$arm-none-eabi-gcc -c -Os -mcpu=cortex-m4 -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb float.c

and got this out of it (comments added for clarity):

   0:   f340 0017       sbfx    r0, r0, #0, #24   @ sign-extend 24-bit value from argument
   4:   ee07 0a90       vmov    s15, r0           @ move 32-bit result to FPU register
   8:   eeb8 0ae7       vcvt.f32.s32    s0, s15   @ convert signed int to 32-bit float
   c:   4770            bx      lr                @ return (with final result in FPU)

Well, that looks like optimal code already - there's no way any manual bit-twiddling is gonna beat a mere 2 single-cycle instructions. Job done!

And if you do happen to be stuck without an FPU, then the fundamental point of the answer remains unchanged - let the compiler/library do the dirty work, because the soft-fp library's conversion implementation will be:

Reliably correct.
Pretty well optimised.
Entirely lost in the noise compared to the overhead of the calculations themselves.

Hey! I just use an M3, and i think it doesnot have floating point "support"(at least not at hardware level). But I agree that your solution would be the optimum! Great Answer! — Chuchaki, May 26 '16 at 20:28

score 2 · Answer 2 · answered May 26 '16 at 19:53

Float32_t is the standard IEEE 32-bit floating point standard, which is the base (Like the float64_t) of the hardware floating-point unit supported by several ARM CPUs.

There is 1 bit of sign (Bit 31), 8 bits of exponent, and 23 bits of mantissa: https://en.wikipedia.org/wiki/Single-precision_floating-point_format

If you have a CPU that contains a hardware floating-point, you can directly use the instructions to convert the 32-bit integer to the 32-bit floating-point (VCVT instruction).

Convert 24bit Two's Complement to float_32t

2 Answers2