Let's say that I want to set a char array with different values, but for the sake of simplicity:
char buff[1024];
...
for (int i = 0; i < 1024; i++) buff[i] = NULL;
Is the compiler going to optimize this to match the bus width? Or should I manually do this:
char buff[1024];
...
size_t empty = NULL;
for (int i = 0; i < 1024 / sizeof(size_t); i++)
memcpy(buff + i * sizeof(size_t), &empty, sizeof(size_t));
In the supposed case that sizeof(size_t)
is the bus width.
I made some measurements, I think it just proves the points stated:
#define TIMES 512
#define SIZE 4194304
int main(void) {
char *buff = new char[SIZE];
int times = TIMES;
clock_t begin = clock();
void *pattern = (void*)0xffeeddcc;
while (times--) {
... some for loop ...
};
clock_t end = clock();
delete[] buff;
std::cout << ((float)(end - begin) / CLOCKS_PER_SEC) << " s elapsed.\n";
return 0;
};
Set char by char:
for (int i = 0; i < SIZE; i++) buff[i] = i % 0xff;
Average elapsed time: 13.6284 s
Set fixed size at a time (bus width):
for (int i = 0; i < SIZE / sizeof(void*); i++) {
void* sub = (void*)(((i * sizeof(void*)) % 0xff) + (((i * sizeof(size_t) + 1) % 0xff) << 8) + (((i * sizeof(void*) + 2) % 0xff) << 16) + (((i * sizeof(void*) + 3) % 0xff) << 24));
memcpy(buff + i * sizeof(void*), &sub, sizeof(void*));
};
Average elapsed time: 19.4352 s
Pattern char by char:
for (int i = 0; i < SIZE; i++) buff[i] = ((char*)&pattern)[i % sizeof(void*)];
Average elapsed time: 17.1696 s
Pattern fixed size (bus width):
for (int i = 0; i < SIZE / sizeof(void*); i++) memcpy(buff + i * sizeof(void*), &pattern, sizeof(void*));
Average elapsed time: 5.6976 s
I don't know if all this measurement was necessary XD
Done with a 2 GHz, 2 cores CPU (Intel Core i3-5005U CPU @ 2.00 GHz).