This may seem like a stupid/obvious question to some of you, but I'm still learning so please be gentle haha.
I'm writing an application without the CRT, so I have to implement my own memcpy function. After doing everything and getting it working, I noticed the application was performing significantly slower than it's CRT counterpart. After a while I tracked it down to my custom memcpy function.
void* _memcpy(void* destination, void* source, size_t num)
{
char* d = (char*)destination;
char* s = (char*)source;
while (num--)
*d++ = *s++;
return destination;
}
My friend told me this was a complete sh*t implementation, so I'm posting this here to ask how I could at least improve it to meet the performance of it's CRT counterpart. And also to get an explanation of why it's so slow