3

Let's say that I want to set a char array with different values, but for the sake of simplicity:

char buff[1024];
...
for (int i = 0; i < 1024; i++) buff[i] = NULL;

Is the compiler going to optimize this to match the bus width? Or should I manually do this:

char buff[1024];
...
size_t empty = NULL;
for (int i = 0; i < 1024 / sizeof(size_t); i++)
    memcpy(buff + i * sizeof(size_t), &empty, sizeof(size_t));

In the supposed case that sizeof(size_t) is the bus width.

I made some measurements, I think it just proves the points stated:

#define TIMES 512
#define SIZE 4194304

int main(void) {

    char *buff = new char[SIZE];

    int times = TIMES;

    clock_t begin = clock();

    void *pattern = (void*)0xffeeddcc;

    while (times--) {

        ... some for loop ...
    };

    clock_t end = clock();

    delete[] buff;

    std::cout << ((float)(end - begin) / CLOCKS_PER_SEC) << " s elapsed.\n";

    return 0;
};

Set char by char:

for (int i = 0; i < SIZE; i++) buff[i] = i % 0xff;

Average elapsed time: 13.6284 s

Set fixed size at a time (bus width):

for (int i = 0; i < SIZE / sizeof(void*); i++) {
    void* sub = (void*)(((i * sizeof(void*)) % 0xff) + (((i * sizeof(size_t) + 1) % 0xff) << 8) + (((i * sizeof(void*) + 2) % 0xff) << 16) + (((i * sizeof(void*) + 3) % 0xff) << 24));

    memcpy(buff + i * sizeof(void*), &sub, sizeof(void*));
};

Average elapsed time: 19.4352 s

Pattern char by char:

for (int i = 0; i < SIZE; i++) buff[i] = ((char*)&pattern)[i % sizeof(void*)];

Average elapsed time: 17.1696 s

Pattern fixed size (bus width):

for (int i = 0; i < SIZE / sizeof(void*); i++) memcpy(buff + i * sizeof(void*), &pattern, sizeof(void*));

Average elapsed time: 5.6976 s

I don't know if all this measurement was necessary XD

Done with a 2 GHz, 2 cores CPU (Intel Core i3-5005U CPU @ 2.00 GHz).

GuillemVS
  • 356
  • 1
  • 13
  • But let's suppose it's not a NULL everytime, I used it like that for the sake of simplicity. – GuillemVS May 30 '20 at 09:28
  • I explained it poorly though (I will edit) and thanks for the time! – GuillemVS May 30 '20 at 09:30
  • 3
    If you need a (potentially) different `char` value for each element, then any attempt at 'optimizing' by packing `n` chars into a temporary buffer will probabaly be counter-productive. – Adrian Mole May 30 '20 at 09:30
  • Not 100%, but bus width is more likely to be `sizeof(void*)`. – Adrian Mole May 30 '20 at 09:33
  • 2
    Chances are high for both implementations that the compiler's optimizer will change it to what he thinks is best and that you get the same compiler output in both cases. But as always when talking about performance: measure it! – Werner Henze May 30 '20 at 09:34
  • Okay thanks! I suppose it'll be faster if it's a constant `n` pattern? – GuillemVS May 30 '20 at 09:34

1 Answers1

1

If you're setting each char to the same value, just call memset.

Compilers (namely gcc and clang) do recognize loops like

for (int i = 0; i < 1024; i++) buff[i] = 0xff;

Clang turns it into a memset call; gcc uses instructions setting one word at a time: https://gcc.godbolt.org/z/ovRTPU. But you get the same assembly output from making a call to memset (it's generally a compiler-recognized function (like memcpy)).

If you're setting a buffer with static storage duration to 0, you don't need to do anything because it already will have been zeroed by the time the program is loaded.

(BTW, using NULL for 0 isn't a good idea. At least in C, NULL can be (void*)0 which won't get assigned to an integer type without warnings/errors.)

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • Yes of course! But I was thinking more like a changing value or pattern. – GuillemVS May 30 '20 at 09:45
  • @GuillemVS If it's the same value for each word (like `0x1122334455667788`), then the `memcpy(...,sizeof(size_t))` strategy will definitely be faster (you should try and measure it for yourself). If not, then I believe putting the value together (with shifts and bit ors) for each word will likely make it slower than writing the `char` down directly. Why don't you try and benchmark it? – Petr Skocik May 30 '20 at 09:50
  • 1
    You mean to do measurements? I made some, but I don't know if it's going to be useful. Just for the sake of it xD – GuillemVS May 30 '20 at 10:53
  • @GuillemVS Yes. I think your benchmarks pretty much answer your own question (& you could put them in an answer). Although your type casts are weird. I don't understand how the compiler lets you get away with assigning integers casted to pointers back to integers. – Petr Skocik May 30 '20 at 10:58