Why use memcpy() when you can directly pass pointers to function in C?

Question

This is a part of the source code of BCrypt file encryption utility. Unchanged, except some comments I have added.

uLong BFEncrypt(char **input, char *key, uLong sz, BCoptions *options) {
  uInt32 L, R;
  uLong i;
  BLOWFISH_CTX ctx;
  int j;
  unsigned char *myEndian = NULL;
  j = sizeof(uInt32);

  getEndian(&myEndian);

// makes space 2 bytes
  memmove(*input+2, *input, sz);

// add endian and compresssion option
  memcpy(*input, myEndian, 1);
  memcpy(*input+1, &options->compression, 1);

  sz += 2;    /* add room for endian and compress flags */ // total size increased

  Blowfish_Init (&ctx, key, MAXKEYBYTES); // initialize 

// encrypt the file
  for (i = 2; i < sz; i+=(j*2)) {   /* start just after tags */
    memcpy(&L, *input+i, j);// copy j bytes from input to L
    memcpy(&R, *input+i+j, j); // copy second j byte to R
    Blowfish_Encrypt(&ctx, &L, &R); // encrypt
    memcpy(*input+i, &L, j); // copy everything back
    memcpy(*input+i+j, &R, j);
  }

  if (options->compression == 1) {
    if ((*input = realloc(*input, sz + j + 1)) == NULL)
      memerror();

    memset(*input+sz, 0, j + 1);
    memcpy(*input+sz, &options->origsize, j);
    sz += j;  /* make room for the original size      */
  }

  free(myEndian);
  return(sz);
}

In the loop, we are first copying the file buffer byte-by-byte to new variables and then applying blowfish encryption. And then again copying the bytes to the buffer. Why can't I pass bytes directly to the encrypting function? Why memcpy() is even required?

"Why can't I pass bytes directly to the encrypting function?" because it's the wrong type? The function takes a pair of pointers to uint32_t, not bytes. — Masklinn, Feb 23 '21 at 15:01
Because you can't take a `char *` pointer, cast it to `uint32_t *` and treat what it points to as a `uint32_t`. [***NO YOU CAN'T***](https://stackoverflow.com/questions/47510783/why-does-unaligned-access-to-mmaped-memory-sometimes-segfault-on-amd64/47512025#47512025). Not even on x86. It's not safe, it's [a violation of strict aliasing](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule), and anyone who says it's safe is flat-out wrong. "But it works for me" is just another way of saying, "I haven't observed a failure yet" — Andrew Henle, Feb 23 '21 at 15:08
@AndrewHenle I'm aware of the strict aliasing rule, but isn't `char*` an exception to that? https://stackoverflow.com/a/99010/6699433 — klutt, Feb 23 '21 at 15:15
@klutt That exception goes one way. Yes, you can index a `uint32_t` as a `char` array, but you cannot treat a `char` array as a `uint32_t` — Christian Gibbons, Feb 23 '21 at 15:16
@klutt You can treat anything as an array of `char`, yes. But you **can not** treat an array of `char` as anything else. — Andrew Henle, Feb 23 '21 at 15:16
@ChristianGibbons Yeah, I just came to the part that it was a one-way :D — klutt, Feb 23 '21 at 15:17
Thanks, everyone for pointing that out, I really missed that point, sadly :( — Avinal, Feb 23 '21 at 16:12

Eric Postpischil · Accepted Answer · 2021-02-23T15:25:22.317

Why can't I pass bytes directly to the encrypting function?

There are two rules against it, or at least not supporting it.

The first is that converting a pointer to char to a pointer to an int has undefined behavior if the alignment is not correct for an int, and, even if the alignment is correct, the value of the result is not fully defined. Rules about this are in C 2018 6.3.2.3, which covers pointer conversions.

Commonly, objects such as int are required to be located at multiples of four bytes. This is because of how computer memory and the data bus are organized; the various “wires” involved are set up to transfer things in groups of certain sizes and alignments. When the compiler for such a system generates instructions to work with int objects, it generates instructions that load aligned words. If you take a char pointer that is not aligned and convert it to a pointer to int, some processors will generate a trap when a load-aligned-word instruction attempts to use an unaligned address. Other processors may ignore the low bits of the address and load an aligned word from a different address.

Even if the address is aligned, the C standard does not guarantee the result of converting a char * to an int * actually points to the same place as the original. This is because in some systems, mostly archaic now, pointers to different types were represented in different ways. Some systems access memory only in words of multiple bytes, so, to implement char *, a compiler has to synthesize addresses different from the hardware addresses, whereas, for int *, a compiler might use the hardware address directly.

The second rule is that memory designated as to be used for one type, such as an array of char, may not be freely used as another type, such as int. This rule is in C 2018 6.5 7. It has specific situations that are allowed, such as that any type, such as int or float, can be accessed as char, but not vice-versa. A purpose for this rule is so that a routine passed an int *i and a float *f can know that in code like this:

for (int j = 0; j < 1024; ++j)
    f[j] += *i;

the f[j] always accesses a float and never accesses an int, so the body of this loop never changes the value of *i. This means the compiler can optimize the code to:

int t = *i;
for (int j = 0; j < 1024; ++j)
    f[j] += t;

which saves the work of repeatedly loading *i from memory, because the temporary object t can be kept in a processor register. (On top of that, the compiler could actually use float t = *i;, saving both the repeated loading of *i from memory and the repeated conversion to float for the addition.)

You may look at that motivation and then look at Blowfish_Encrypt and see that BlowFish_Encrypt never benefits from this potential optimization, maybe because it never operates with mixed types that would be affected by this rule. However, the complexities of compiler optimizations become harder to see as compilers becoming increasingly more advanced and aggressive in their transformations, so it is easy to miss some advantage the compiler will get from the rule about aliasing one type as another. In any event, because the rule exists, you have no assurance that your program will work if you violate it.

Wow, this is a really nice description. Special thanks for the last point, it is really important. One more thing, I am implementing this whole routine in C++, is there any optimization/simplification I can benefit from? — Avinal, Feb 23 '21 at 16:09

Why use memcpy() when you can directly pass pointers to function in C?

1 Answers1