12

Howard Chu writes:

In the latest C spec it is impossible to write a "legal" implementation of malloc or memcpy.

Is this right? My impression is that in the past, the intent (at least) of the standard was that something like this would work:

void * memcpy(void * restrict destination, const void * restrict source, size_t nbytes)
{
    size_t i;
    unsigned char *dst = (unsigned char *) destination;
    const unsigned char *src = (const unsigned char *) source;

    for (i = 0; i < nbytes; i++)
        dst[i] = src[i];
    return destination;
}

What rules in the latest C standard are violated here? Or, what part of the specification of memcpy is not correctly implemented by this code?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Jason Orendorff
  • 42,793
  • 6
  • 62
  • 96
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/187221/discussion-on-question-by-jason-orendorff-is-it-technically-impossible-to-implem). – Samuel Liew Jan 24 '19 at 04:56
  • It would seem to be prudent to not lose the `const` of `source` and use `const unsigned char *src`, although `src[]` does not modify anything, so I would say it is not _needed_, but `const` absence distracts from the main issue. – chux - Reinstate Monica Jan 24 '19 at 05:18
  • The _effective type_ issue seems to be relevant here - defined in C11 §6.5 6. – chux - Reinstate Monica Jan 24 '19 at 05:24
  • 2
    As far as I can tell, this Howard Chu is writing nonsense. I don't see any rules of C violated in your implementation of `memcpy`, and the "effective type" thing is not new in the latest C spec (it's been there for twenty years now). – melpomene Jan 24 '19 at 15:11

2 Answers2

1

For the malloc function, paragraph 6.5 §6 makes it clear that it is not possible to write a conformant and portable C implementation :

The effective type of an object for an access to its stored value is the declared type of the object, if any(87)...

The (non normative) note 87 says:

Allocated objects have no declared type.

The only way to declare a object with no declared type is... through the allocation function which is required to return such an object! So inside the allocation function, you must have something that cannot be allowed by the standard to setup a memory zone with no declared type.

In common implementations, the standard library malloc and free are indeed implemented in C, but the system knows about it and assumes that the character array which has been provided inside malloc just has no declared type. Full stop.

But the remaining part of the same paragraph explains that there is no real problem in writing a memcpy implementation (emphasize mine):

... If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

Provided you copy the object as an array of character type, which is a special access allowed per the strict aliasing rule, there is no problem in implementing memcpy, and your code is a possible and valid implementation.

IMHO the rant of Howard Chu is about that old good memcpy usage, which is no longer valid (assuming sizeof(float) == sizeof(int)):

float f = 1.0;
int i;
memcpy(&i, &f, sizeof(int));         // valid: copy at byte level, but the value of i is undefined
print("Repr of %f is %x\n", i, i);   // UB: i cannot be accessed as a float
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Wouldn't the value of `i` in your example be implementation defined (and possibly trap representation) rather than undefined? – Christian Gibbons Jan 24 '19 at 17:00
  • @ChristianGibbons: I would say that what happens is unspecified by the standard. An implementation can specify it, but I could not find an evidence that it is required to. – Serge Ballesta Jan 24 '19 at 17:06
  • I would think it would be implicitly implementation defined. Specifically as it relates to the representation of `float`s and the representation of `int`s. Both of those are implementation defined. `memcpy` operation to copy the bytes from the `float` to the `int` should be well-defined (copy the bytes exactly as they are). Therefore the value held by `i` should be resolvable given the implementation details of `float` and `int`, with the only caveat being the possibility of trap representation. – Christian Gibbons Jan 24 '19 at 17:18
  • But `i` has a _declared type_, so that rule does not apply. –  Jan 25 '19 at 23:27
0

TL;DR
It should be fine, as long as the memcpy is based on naive character-by-character copy.

And not optimized to move chunks of the size of the largest aligned type that can be copied in a single instruction. The latter is how standard lib implementations do it.


What's concerning is something like this scenario:

void* my_int = malloc(sizeof *my_int);
int another_int = 1;

my_memcpy(my_int, &another_int, sizeof(int));

printf("%d", *(int*)my_int); // well-defined or strict aliasing violation?

Explanation:

  • The data pointed at my my_int has no effective type.
  • When we copy the data into the my_int location, one might be concerned that we force the effective type to become unsigned char, since that's what my_memcpy uses.
  • And then when we read that memory location through int*. Would we violate strict aliasing?

However, the key here is a special exception in the rule for effective type, specified in C17 6.5/6, emphasis mine:

If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.

Since we do copy the array as character type, the effective type of what my_int points at will become that of the object another_int from which the value was copied.

So everything should be fine.

In addition, you restrict-qualified the parameters so there should be no fuss regarding if the two pointers might alias each other, just like real memcpy.

Notably, this rule has remained the same through C99, C11 and C17. One might argue that it is a very bad rule abused by compiler vendors, but that's another story.

Lundin
  • 195,001
  • 40
  • 254
  • 396