4

to convert a byte array from another machine which is big-endian, we can use:

long long convert(unsigned char data[]) {
  long long res;
  res = 0;
  for( int i=0;i < DATA_SIZE; ++i)
    res = (res << 8) + data[i];
  return res;
}

if another machine is little-endian, we can use

long long convert(unsigned char data[]) {
  long long res;
  res = 0;
  for( int i=DATA_SIZE-1;i >=0 ; --i)
    res = (res << 8) + data[i];
  return res;
}

why do we need the above functions? shouldn't we use hton at sender and ntoh when receiving? Is it because hton/nton is to convert integer while this convert() is for char array?

user389955
  • 9,605
  • 14
  • 56
  • 98

2 Answers2

8

The hton/ntoh functions convert between network order and host order. If these two are the same (i.e., on big-endian machines) these functions do nothing. So they cannot be portably relied upon to swap endianness. Also, as you pointed out, they are only defined for 16-bit (htons) and 32-bit (htonl) integers; your code can handle up to the sizeof(long long) depending on how DATA_SIZE is set.

TypeIA
  • 16,916
  • 1
  • 38
  • 52
  • @ dvnrrs : let me ask in another way, when I receive an int from a machine (can be from any machine), should I ntoh or use convert()? from your answer, it seems ntoh and convert is doing the same thing except ntoh can only handle short and int, not long long and byte array. Thanks – user389955 Feb 24 '14 at 20:24
  • @user389955 It depends on the network protocol! If the sender promises to give you a *big-endian* (= "network order") value, and it's a 16-bit or 32-bit integer, you can safely use htons/htonl. Otherwise you need some custom code like what you posted. – TypeIA Feb 24 '14 at 20:28
  • @user389955, you should never receive an `int` from another machine and you should never use `ntoh` and the like. Rather data from another machine should always be handled as `char[]` until they are combined into a native integer. IIRC, all modern compilers will optimize the functions in the original question at least as well as they will optimize `ntoh` et al. – BCS May 02 '23 at 04:15
1

Through the network you always receive a series of bytes (octets), which you can't directly pass to ntohs or ntohl. Supposing the incoming bytes are buffered in the (unsigned) char array buf, you could do short x = ntohs(*(short *)(buf+offset)); but this is not portable unless buf+offset is always even, so that you read with correct alignment. Similarly, to do long y = ntohl(*(long *)(buf+offset)); you have to make sure that 4 divides buf+offset. Your convert() functions, though, don't have this limitation, they can process byte series at arbitrary (unaligned) memory address.

Ferenc Wágner
  • 246
  • 2
  • 9
  • As a matter of fact: that's a bad idea to cast pointers like this because of strict aliasing. – Lapshin Dmitry Feb 18 '19 at 15:42
  • 1
    @LapshinDmitry Thanks. Since `buf` is a `char` array, aliasing problems arise only if you write into `buf` via similar casts. But it's certainly something to think of. – Ferenc Wágner Feb 19 '19 at 13:27
  • Another reason to prefer `convert()` or the like is that it avoids the question of what byte order a (non `char`) "thing" is. With `convert()` you only deal with bytes or native `int`s. Sadly, this gets messy or impossible to do when dealing with things like APIs that pass structs (like `sockaddr_in`) defined to use network byte order. – BCS May 02 '23 at 04:22