I need to efficiently swap the byte order of an array during copying into another array.
The source array is of a certain type; char, short or int so the byte swapping required is unambiguous and will be according to that type.
My plan is to do this very simply with a multi-pass byte-wise copy (2 for short, 4 for int, ...). However are there any pre-existing "memcpy_swap_16/32/64" functions or libraries? Perhaps in image processing for BGR/RGB image processing.
EDIT
I know how to swap the bytes of individual values, that is not the problem. I want to do this process during a copy that I am going to perform anyway.
For example, if I have an array or little endian 4-byte integers I can do they swap by performing 4 bytewise copies with initial offsets of 0, 1, 2 and 3 with a stride of 4. But there may be a better way, perhaps even reading each 4-byte integer individually and using the byte-swap intrinsics _byteswap_ushort, _byteswap_ulong and _byteswap_uint64 would be faster. But I suspect there must be existing functions that do this type of processing.
EDIT 2
Just found this, which may be a useful basis for SSE, though its true that memory bandwidth probably makes it a waste of time.