2

So I'm trying to implement my personal MemSet that will do the same as memset but also:

  • Copy word size chunks when possible, instead of byte by byte.

  • Guarantee dest's alignment

  • Test for all alignment possibilities

So this is my code:

void *MemSet(void *dest, int c, size_t n)
{
    unsigned char *runner = (unsigned char *)dest;
    
    size_t i = 0;
    
    unsigned char swap_word[sizeof(size_t)];
    
    for (i = 0; i < sizeof(size_t); ++i)
    {
        swap_word[i] = (unsigned char)c;
    }
    
    if (NULL == dest)
    {
        return (NULL);
    }
    
    while (n > 0)
    {
        /* setting byte by byte */
        if (n < sizeof(size_t) || (((size_t)runner & (sizeof(size_t) - 1)) != 0))
        {
            *runner++ = (unsigned char)c;
            --n;
            printf("Byte written\n"); /* for debugging */
        }
        else
        {
            /* setting a whole word */
            *((void **)runner) = *((void **)swap_word);
            runner += sizeof(size_t);
            n -= sizeof(size_t);
            printf("Word written\n"); /* for debugging */
        }
    }
    return (dest);
}

What am I doing here?

  • creating an unsigned char pointer runner to run over the dest but without changing it address so I'll be able to return it as a return value.

  • creating a ready swap_word array, with the size of sizeof(size_t) because this size determines whether my machine is 32 or 64 bit (and therefore the WORD size is 4 or 8 bytes. This array will swapped when I'll need to set a word.

  • running a simple while loop that will check if there are more than sizeof(size_t) bytes left to set , if not, it means for sure we won't be able to set a whole word and then set them byte by byte.

  • Another option to set the bytes byte by byte is if the address isn't divides by 4 or 8 (again, depends on the machine), which means I won't be able to set up a word without crossing WORD Boundary, so just set them byte by byte until I'll reach an aligned address.

  • Only option to set up a whole word is only if the data is already aligned to the WORD size of the machine, and then just set up 8 bytes (just set them up using the array of swap_word we made earlier, and advance 8 more addresses. I'll do it by using casting of

    *((void **)runner) = *((void **)swap_word);

and this is my test file:

int array[] = { 2, 3 };
    
int main () 
{
    for (i = 0; i < 2; i++)
    {
        printf("Before MemSet, target is \"%d\"\n\n", array[i]);
    }
    if (NULL == MemSet(array, 3, 2 * sizeof(int)))
    {
        fprintf(stderr,"MemSet failed!\n");
        
    }
    for (i = 0; i < 2; i++)
    {
        printf("After MemSet, target is \"%d\"\n\n", array[i]);
    }
    return (0);
}

Output is:

Before Memset, target is "2"

Before Memset, target is "3"

Word written
After Memset, target is "50529027"

After Memset, target is "50529027"

Why aren't the elements are '3'? both of them? I'm using here

MemSet(array, 3, 2 * sizeof(int))

Which, by theory, needs to set up both of the elements as 3 because the array uses 2*sizeof(int) spaces in the memory, and I set up all of them as 3.

What do you think? And also, how can I check if my alignment works?

Thanks.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
NoobCoder
  • 513
  • 3
  • 18
  • 2
    The overhead of your code is defeating all of the advantages it might be having over a simple byte-wise memset. – Eugene Sh. Mar 08 '21 at 23:11
  • @EugeneSh. I'm just following the instructions of my task, I have to add this option of setting a whole word if that's possible. – NoobCoder Mar 08 '21 at 23:12

1 Answers1

4

Your function has multiple problems:

  • you test for word size move at each iteration, which is likely slower than the simple byte operation.

  • *((void * *)runner) = *((void **)swap_word); is incorrect because it violates the aliasing rule and because swap_word might not be correctly aligned for the void * type.

You should run separate loops:

  • the first one to align the destination pointer
  • the second one to set full words, possibly more than one at a time
  • the last one to set the trailing bytes if any

Here is an example:

#include <limits.h>
#include <stdio.h>
#include <stdint.h>

// assuming uintptr_t has no padding bits
void *MemSet(void *dest, int c, size_t n) {
    if (dest != NULL) {
        unsigned char *p = dest;
        if (n >= sizeof(uintptr_t)) {
            // align destination pointer
            // this test is not fully defined but works on all classic targets
            while ((uintptr_t)p & (sizeof(uintptr_t) - 1)) {
                *p++ = (unsigned char)c;
                n--;
            }
            // compute word value (generalized chux formula)
            uintptr_t w = UINTPTR_MAX / UCHAR_MAX * (unsigned char)c;
            // added a redundant (void *) cast to prevent compiler warning
            uintptr_t *pw = (uintptr_t *)(void *)p;
            // set 16 or 32 bytes at a time
            while (n >= 4 * sizeof(uintptr_t)) {
                pw[0] = w;
                pw[1] = w;
                pw[2] = w;
                pw[3] = w;
                pw += 4;
                n -= 4 * sizeof(uintptr_t);
            }
            // set the remaining 0 to 3 words
            while (n >= sizeof(uintptr_t)) {
                *pw++ = w;
                n -= sizeof(uintptr_t);
            }
            p = (unsigned char *)pw;
        }
        // set the trailing bytes
        while (n --> 0) {
            *p++ = (unsigned char)c;
        }
    }
    return dest;
}

Note however that the above code is unlikely to beat memset() because:

  • the compiler may expand the above logic inline for constant sizes, skipping the alignment tests if the destination pointer is known to be aligned or if the CPU allows unaligned access.
  • the library may use specialized instructions such as SIMD or REP/STOS to increase throughput depending on the actual target CPU.

The reason for the surprising results is int spans 4 bytes, each of which gets set to 3, so the resulting value for the integer is 0x03030303, which is exactly 50529027.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Candidate alternative to `(uintptr_t)0x0101010101010101` --> `UINTPTR_MAX/0xFF`. Now not limited to `sizeof(uintptr_t) <= 8`. – chux - Reinstate Monica Mar 09 '21 at 00:21
  • @chux-ReinstateMonica: thanks! I found an even more general formula, allowing for non 8-bit bytes... Only assuming no padding bits. – chqrlie Mar 09 '21 at 00:26
  • `while ((uintptr_t)p & (sizeof(uintptr_t) - 1))` is a reasonable, but not specified way to attain alignment. No way to do it spec-wise in C. – chux - Reinstate Monica Mar 09 '21 at 00:26
  • @chux-ReinstateMonica: I could use `while ((uintptr_t)p & (alignof(uintptr_t) - 1))` but it would be less portable. – chqrlie Mar 09 '21 at 00:27
  • I like the `4` idea, but maybe use a `#define CHUNK 4` or the like. – chux - Reinstate Monica Mar 09 '21 at 00:28
  • @chux-ReinstateMonica: good idea, but then I would need to rely on loop unrolling. – chqrlie Mar 09 '21 at 00:29
  • `(uintptr_t)p & anything` is assuming a math property of the converted pointer - pointer values are not specified as linear, yet most certainly are. Nice work IAC. – chux - Reinstate Monica Mar 09 '21 at 00:30
  • I'm not allowed to use `uintptr_t` , that's why I'm using `size_t` – NoobCoder Mar 09 '21 at 07:54
  • so I changed every place you wrote it to `size_t`. It compiles and runs but regarding `strings` being sent to the function - it does not work. Regarding int, it does work, for the example above sending an `int array` of 2 elements, it sets the number `24` for the first elements, but the 2nd one is `0`. Instead of them both being `3` – NoobCoder Mar 09 '21 at 08:00
  • 1
    @NoobCoder: I added an explanation for the `50529027` value you observe. There must be another problem when copying the above code to get the result in the comment. Especially what do you mean by *regarding strings being sent to the function - it does not work*? – chqrlie Mar 09 '21 at 08:14
  • @NoobCoder: if you change all the `uintptr_t` to `size_t`, you must also change `UINTPTR_MAX` to `SIZE_MAX`. – chqrlie Mar 09 '21 at 08:19
  • @chqrlie ok it works, but it also worked before (thanks to your explanaion about the `int value` I now understand that my code also workes as needed). You made 3 loops that do the same as my one loop (in my opinion) – NoobCoder Mar 09 '21 at 08:59
  • @NoobCoder: your loop performs 3 tests per word and has an alignment issue on `swap_word`, but it will work for current intel processors. My proposal reduces the number of tests for large sizes to about 0.25 per word and skips overhead for small sizes... It is more work but may be more efficient on average, depending on the application's specific use. – chqrlie Mar 09 '21 at 09:09
  • @chqrlie Last question - Can you explain me why `while ((uintptr_t)p & (sizeof(uintptr_t) - 1))` is valid and finds whether the address is aligned to 4 / 8 (based on the machine)? Why the `-1` is required here? – NoobCoder Mar 09 '21 at 11:22
  • 1
    @NoobCoder: `sizeof(uintptr_t)-1` is a bitmask with all bits set below the alignment value. `while ((uintptr_t)p & mask)` is equivalent to `while (((uintptr_t)p & mask) != 0)` tests whether the pointer is unaligned, ie: has one of the low bits set. – chqrlie Mar 09 '21 at 11:44