2

I'm trying to adapt the following version of the stpcpy function to use restrict-qualified pointers as its arguments and internally, but I'm not sure if simply adding the qualifier would result introduce undefined behavior.

#define ALIGN (sizeof(size_t)-1)
#define ONES ((size_t)-1/UCHAR_MAX)
#define HIGHS (ONES * (UCHAR_MAX/2+1))
#define HASZERO(x) ((x)-ONES & ~(x) & HIGHS)

char *__stpcpy(char *d, const char *s)
{
        size_t *wd;
        const size_t *ws;

        if (((uintptr_t)s & ALIGN) == ((uintptr_t)d & ALIGN)) {
                for (; (*d=*s) && ((uintptr_t)s & ALIGN); s++, d++);
                if (!*s) return d;
                wd=(void *)d; ws=(const void *)s;
                for (; !HASZERO(*ws); *wd++ = *ws++);
                d=(void *)wd; s=(const void *)ws;
        }
        for (; (*d=*s); s++, d++);

        return d;
}

Assuming the rules in C99 6.7.3.1 about accessing an object pertain only to the individual object accessed and not the whole array, I think it may be fine, as the elements written are only accessed once, and only for writing. But I'm rather uncomfortable with using restrict at this point and don't want to rely on just my own judgement.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Is `sizeof(size_t)` guaranteed to be a power of 2? – Nemo Sep 04 '12 at 01:35
  • In general, no, but on the systems where this code is used (it's system library level code, not application code), it will always be either 4 or 8, and `size_t` is the "native word" type. – R.. GitHub STOP HELPING ICE Sep 04 '12 at 01:38
  • I've been digging into the spec for restrict a bit more and it mostly makes sense to me now, and seems like it should be okay to just add restrict on the arguments. However, the one thing that bothers me is the definition of "based on" and how it interacts with the code that checks the alignment. When asking whether `wd` is "based on" `d`, for example, the act of "modifying P to point to a copy of the array object into which it formerly pointed" could change whether the code is even executed (by changing the alignment of the pointer). – R.. GitHub STOP HELPING ICE Sep 04 '12 at 05:04
  • as a question of style I wouldn't do the casts with void pointers, this is just obfuscating what is going on – Jens Gustedt Sep 04 '12 at 06:59
  • @Jens: The reason I use that style is that it makes it easy to change the type of ws/wd without having to change any other code (and the code still does the right thing). Actually as-written `ALIGN` would need changing too, but I'm planning to rewrite that condition as `(uintptr_t)s % sizeof *ws == (uintptr_t)d % sizeof *wd` – R.. GitHub STOP HELPING ICE Sep 04 '12 at 13:44
  • You might like this blog post: http://blog.frama-c.com/index.php?post/2012/08/02/restrict-not-modular – Pascal Cuoq Sep 05 '12 at 20:36
  • It's an interesting post, but I disagree with the view that this is a deficiency in `restrict`. In C, it's extremely rare that the contract of a function interface can be expressed in its prototype. The only functions for which that is possible are ones whose behavior is well-defined on the entire domain of the arguments. Just like any function that takes pointer arguments must set down in its contract rules for what the pointers must point to, any function that uses restrict should state the ranges of offsets it accessed from each restrict-qualified pointer. – R.. GitHub STOP HELPING ICE Sep 06 '12 at 03:24

1 Answers1

3

To comply to the standard the only restriction is that all pointer expression that manipulate any of the objects that the function received through the restrict pointers should be based on that pointer. It doesn't require the pointer expression to be of the same type. So in that sense accessing the object through these size_t* isn't a constraint violation or UB.

I wouldn't be sure if reading the value of a part of the object that you modified through *wd with *d wouldn't alias, since the pointer types differ. But as you say, you don't do that, so this should be safe.

BTW, the code makes one important assumption that is not necessarily portable, namely that the low bits of an uintptr_t reflect the alignment properties of the pointer that you converted to it. This is probably true on all architectures that we encounter in daily use, but not guaranteed by the standard. Even C11 only says something about "multiples of a byte address" where there is nothing that specifies what that would be.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • I was having trouble understanding the definition of "based on". Now that I see it's just a very clever way of expressing the notion of being derived from the original restrict pointer in any way, things are a lot more clear. – R.. GitHub STOP HELPING ICE Sep 04 '12 at 13:37
  • As for paragraph 2 of your answer, one of the types is a character type so they're allowed to alias. And actually for the last byte of the first loop, it's written twice, once in the first loop and once in the second. This is unintended and ugly so I think I'll fix it. – R.. GitHub STOP HELPING ICE Sep 04 '12 at 13:38
  • As for paragraph 3, I agree with your evaluation; the conversion to `uintptr_t` is implementation-defined, and integers/pointers could for example be opposite-endian. The assumption is true everywhere the code will be used, however, and I know of no other way to test for alignment. – R.. GitHub STOP HELPING ICE Sep 04 '12 at 13:40
  • Yes, it "implementation defined" is probably ok, if as in your case you are actually providing things in the C library (or kind of), so something that is part of the implementation. Would be good though if you could come up with a compile time assertion for that. And put a big red warning button in the code. – Jens Gustedt Sep 05 '12 at 17:01