1

If there is a buffer that is supposed to pack 3 integer values, and you want to increment the one in the middle, the following code works as expected:

#include <iostream>
#include <cstring>

int main()
{
    char buffer[] = {'\0','\0','\0','\0','A','\0','\0','\0','\0','\0','\0','\0'};
    
    int tmp;

    memcpy(&tmp, buffer + 4, 4); // unpack buffer[5:8] to tmp
    std::cout<<buffer[4];              // prints A

    tmp++;
    memcpy(buffer + 4, &tmp, 4); // pack tmp value back to buffer[5:8]
    std::cout<<buffer[4];              // prints B

    return 0;
}

To me this looks like too many operations are taking place for a simple action of merely modifying some data in a buffer array, i.e. pushing a new variable to the stack, copying the specific region from the buffer to that var, incrementing it, then copying it back to the buffer.

I was wondering whether it's possible to cast the 5:8 range from the byte array to an int* variable and increment it, for example:

  int *tmp = reinterpret_cast < int *>(buffer[5:8]);
  (*tmp)++;

It's more efficient this way, no need for the 2 memcpy calls.

Arthur O.
  • 13
  • 4

2 Answers2

2

The latter approach is technically undefined, though it's likely to work on any sane implementation. Your syntax is slightly off, but something like this will probably work:

int* tmp = reinterpret_cast<int*>(buffer + 4);
(*tmp)++;

The problem is that it runs afoul of C++'s strict aliasing rules. Essentially, you're allowed to treat any object as an array of char, but you're not allowed to treat an array of char as anything else. Thus to be fully compliant you need to take the approach you did in the first snippet: treat an int as an array of char (which is allowed) and copy the bytes from the array into it, manipulate it as desired, and then copy back.


I would note that if you're concerned with runtime efficiency, you probably shouldn't be. Compilers are very good at optimizing these sorts of things, and will likely end up just manipulating the bytes in place. For instance, clang with -O2 compiles your first snippet (with std::cout replaced with printf to avoid stream I/O overhead) down to:

mov     edi, 65
call    putchar
mov     edi, 66
call    putchar

Demo

Remember, when writing C++ you are describing the behavior of the program you want the compiler to write, not writing the instructions the machine will execute.

Miles Budnek
  • 28,216
  • 2
  • 35
  • 52
  • Nice explanation with the demo, compilers optimize a lot of code these days. I am still slightly confused about the strict aliasing rule. Isn't that rule technically broken when I memcpy a section of the char array to an int variable? Because It looks like I am treating 4 chars as an int on the first approach too. – Arthur O. Sep 07 '22 at 08:33
1

Simply change buffer[5:8] to buffer + 4, just like in your memcpy() calls, and then it will likely work the way you want:

int *tmp = reinterpret_cast<int*>(buffer + 4 /* or: &buffer[4] */);
(*tmp)++;

Alternatively, you can use a reference instead of a pointer:

int &tmp = reinterpret_cast<int&>(buffer[4] /* or: *(buffer+4) */);
tmp++;

However, note that either approach is technically undefined behavior, as accessing the array like this violates the Strict Aliasing rules. The memcpy() approach is the safe and standard way to go, and compilers are very good about optimizing memcpy() calls.

But, the reinterpret_cast approach will likely work nonetheless, depending on your compiler.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 1
    If the array is aligned, the `int` object would have been implicitly created, so there is no strict aliasing issue – Artyer Sep 06 '22 at 23:10
  • "*the `int` object would have been implicitly created*" - technically, only in C++20 and later, not in earlier versions. – Remy Lebeau Sep 07 '22 at 00:11
  • @Artyer `If the array is aligned` In the example, there is nothing guaranteeing the array to be aligned though. – eerorika Sep 07 '22 at 00:23
  • @Artyer by `array is aligned` you mean something like `alignas(int) char buf[] = {...}`? – Arthur O. Sep 07 '22 at 08:51
  • Update: just tried both with `alignas(int)` and without(1-byte alligned). After the reinterpret_cast to `int` at `buffer[4]`, the returned value is an integer that interprets the 4 bytes starting from `buffer[4]`. This is probably UB, because with 1-byte allignment it should've interpreted only the byte at address `buffer[4]`, without the next 3. – Arthur O. Sep 07 '22 at 09:23