The best way in C++ to cast different signedness types each other?

Question

There is an uint64_t data field sent by the communication peer, it carries an order ID that I need to store into a Postgresql-11 DB that do NOT support unsigned integer types. Although a real data may exceed 2^63, I think a INT8 filed in Postgresql11 can hold it, if I do some casting carefully.

Let's say there be:

uint64_t order_id = 123; // received
int64_t  to_db;          // to be writed into db

I plan to use one of the following methods to cast an uint64_t value into an int64_t value:

to_db = order_id; // directly assigning;
to_db = (int64_t)order_id; //c-style casting;
to_db = static_cast<int64_t>(order_id);
to_db = *reinterpret_cast<const int64_t*>( &order_id );

and when I need to load it from the db, I can do a reversed casting.

I know they all work, I'm just interested in which one meet the C++ standard the most perfectly.

In other words, which method will always work in whatever 64bit platform with whatever compiler?

Every one of these is undefined behaviour if the value exceeds 2^63-1. — n. m. could be an AI, Dec 02 '20 at 09:40
There is an additional option: `memcpy(&to_db, &order_id, 8);`. — Daniel Langr, Dec 02 '20 at 09:52
@DanielLangr that works.. as long as system that wrote those values and system that read them is in same conditions — Swift - Friday Pie, Dec 02 '20 at 09:58
@n.'pronouns'm. Moreover, AFAIK, the 4th case always yields undefined behavior. — Daniel Langr, Dec 02 '20 at 09:59

Swift - Friday Pie · Answer 1 · 2020-12-02T10:17:50.660

Depends where it would be compiled and run... any of those not fully portable without C++20 support.

The safest way without that would be doing conversion yourself by changing range of values, something like that

int64_t to_db = (order_id > (uint64_t)LLONG_MAX) 
           ? int64_t(order_id - (uint64_t)LLONG_MAX - 1) 
           : int64_t(order_id ) - LLONG_MIN;

uint64_t from_db = (to_db < 0) 
                    ? to_db + LLONG_MIN
                    : uint64_t(to_db) +  (uint64_t)LLONG_MAX  + 1;

If order_id is greater than (2^63 -1), then order_id - (uint64_t)LLONG_MAX - 1 yields a non-negative value. If not, then cast to signed is well defined and subtraction ensures values to be shifted into negative range.

During reverse conversion, to_db + LLONG_MIN places value into [0, ULLONG_MAX] range.

and do opposite on reading. Database platform or compiler you use may do something awful with binary representation of unsigned values when converting them to signed, not to mention that different format of signed do exist.

For same reason inter-platform protocols often involve use of string formatting or "least bit's value" for representing floating point values as integers, i.e. as encoded fixed point.

Daniel Langr · Answer 2 · 2020-12-02T10:38:06.207

1

I would go with memcpy. It avoids (? see comments) undefined behavior and typically compilers optimize any byte copying away:

int64_t uint64_t_to_int64_t(uint64_t u)
{
  int64_t i;
  memcpy(&i, &u, sizeof(int64_t));
  return i;
}

order_id = uint64_t_to_int64_t(to_db);

GCC with -O2 generated the optimal assembly for uint64_t_to_int64_t:

mov rax, rdi
ret

Live demo: https://godbolt.org/z/Gbvhzh

edited Dec 02 '20 at 10:38

answered Dec 02 '20 at 10:20

Daniel Langr

22,196
3
50
93

"It avoids undefined behavior" I would not make such a claim. – n. m. could be an AI Dec 02 '20 at 10:20
@n.'pronouns'm. Why not? – Daniel Langr Dec 02 '20 at 10:21
`memcpy` itself is OK, it's the subsequent use of the value which is problematic. C++20 may make it OK, not sure about that. – n. m. could be an AI Dec 02 '20 at 10:22
@n.'pronouns'm. Could you please explain? – Daniel Langr Dec 02 '20 at 10:24
I don't think you can put any random bit pattern in a signed integer type and expect it to work. I don't see any such guarantee in the standard. – n. m. could be an AI Dec 02 '20 at 10:26
@n.'pronouns'm. I know that cppreference is not normative, but there is exactly such a case there: https://en.cppreference.com/w/cpp/string/byte/memcpy. Noth the example with a `double` and `int64_t`. Also, there is written: _Where strict aliasing prohibits examining the same memory as values of two different types, `std::memcpy` may be used to convert the values._ – Daniel Langr Dec 02 '20 at 10:29
@n.'pronouns'm. This seems to be relevant: https://stackoverflow.com/q/51300626/580083. You're right that it is not a simple problem and people there argue about whether it is or is not UB. The first two most upvoted answer differ in their opinion :-o. Another relevant question: https://stackoverflow.com/q/39595103/580083. – Daniel Langr Dec 02 '20 at 10:37
Both linked questions deal with unsigned types. Unsigned types cannot have trap representations, all bit patterns represent values, therefore memcpy'ing anything to an unsigned type and examining the result is perfectly well defined. There is no such guarantee w.r.t. signed types. Cppreference should have used an unsigned type too. – n. m. could be an AI Dec 02 '20 at 12:46

score 0 · Answer 3 · answered Dec 02 '20 at 09:39

All four methods will always work, as long as the value is within range. The first will generate warnings on many compilers, so should probably not be used. The second is more a C idiom than a C++ idiom, but is widely used in C++. The last one is ugly and relies on subtle details from the standard, and should not be used.

score 0 · Answer 4 · answered Dec 02 '20 at 10:14

0

This function seems UB-free

int64_t fromUnsignedTwosComplement(uint64_t u)
{
    if (u <= std::numeric_limits<int64_t>::max()) return static_cast<int64_t>(u);
    else return -static_cast<int64_t>(-u);
}

It reduces to a no-op under optimisations.

Conversion in the other direction is a straight cast to uint64_t. It is always well-defined.

answered Dec 02 '20 at 10:14

n. m. could be an AI

112,515
14
128
243

Could you elaborate on what `-u` does in the latter cast? – Surt Dec 02 '20 at 10:45
@Surt It calculates the expression `-u` modulo `2^64`, which would be the same as `2^64-u` if we could write down the constant `2^64`. – n. m. could be an AI Dec 02 '20 at 12:37

The best way in C++ to cast different signedness types each other?

4 Answers4