Flexibly set floating point number precision at compile time

Question

I have a C++ program that can be compiled for single or double precision floating point numbers. Similar as explained here (Switching between float and double precision at compile time), I have a header file which defines:

typedef double dtype

or:

typedef float dtype

depending on whether single or double precision is required by the user. When declaring variables and arrays I always use the data type dtype, so the correct precision is used throughout the code.

My question is how can I, in a similar fashion, set the data type of hard-coded numbers in the code, like for instance in this example:

dtype var1 = min(var0, 3.65)

As far as I know, 3.65 is by default double precision and will be single precision if I write:

dtype var1 = min(var0, 3.65f)

But is there a way to define a literal, for instance like this:

dtype var1 = min(var0, 3.65_dt)

that can either be defined as float or double at compile time to ensure that also hard-coded numbers in the code will have the right precision?

Currently, I cast the number to dtype like this:

dtype var1 = min(var0, (dtype)3.65)

but I was concerned that this might create overhead in the case of single precision since the program might actually create a double precision number which is then cast to a single precision number. Is this indeed the case?

`constexpr dtype x = 3.65;` `x` will be calculated at compile time. But I would expect that to be the case with your code as well. — john, Mar 03 '23 at 20:14
Macros really shouldn't be entering the scene at all. You should be able to express all of this with fairly straight forward templates — Brian61354270, Mar 03 '23 at 20:16
Concerning the literal: https://en.cppreference.com/w/cpp/language/user_literal — joergbrech, Mar 03 '23 at 20:35

Eric Postpischil · Answer 1 · 2023-03-03T20:32:05.720

You can do this with a macro that appends an f suffix for float, as with #define foo(x) x##f, and does not for double, as with #define foo(x) x.

While you can also coerce constants to become float values with casts or various induced conversions, this creates a double-rounding process: The literal in source text is first converted to double and then converted to float. In about one instance in 2²⁹, this produces a different result than if the literal is directly converted to float.

(2²⁹ is due to the difference in the numbers of bits in the significands of the formats commonly used for float and double, 24 and 53. This assumes a uniform distribution for the bit patterns in the representation. Practical data may have a different distribution.)

Flexibly set floating point number precision at compile time

1 Answers1