Are there optimized versions of memmove for when I know the direction?

Question

Imagine that I am implementing inserting and deleting within a small vector. (If this is C++ then assume further that the vector elements are trivially copyable.)

When inserting into the middle of this vector (assuming that I have ascertained that no reallocation is necessary), I know that the copy to make space for a new element must move bytes to higher addresses. Similarly, when implementing erase in the middle of this vector, I know that the copy to eliminate the erased object must move bytes to lower addresses.

memmove will sort this out, but it will spend time comparing the supplied addresses so as to choose a 'move up' or 'move down' loop. I expect my vectors to be quite small. (In reality they are the buckets in a open addressing, linear probing, RobinHood hash table.) Thus I am interested in optimizing the entire data move operation. My question is, can I eliminate that initial memmove start-up overhead? Ideally, I would like to achieve such an optimization across the big three platforms (Windows, Mac and Linux).

"but it will spend time comparing the supplied addresses so as to choose a 'move up' or 'move down' loop" --> __it will spend an [insignificant](https://softwareengineering.stackexchange.com/q/80084/94903) time comparing_ ... Better to spend your valuable time seeking significant time savings. E.g. post your best hash table with a test harness and ask for performance improvements. — chux - Reinstate Monica, Jan 21 '23 at 20:09
@chux, The exercise here is to show that the STL set and map abstractions have taken us far from Bjarne's "only pay for what you use" aphorism. I rest assured, if I achieve something worth bringing forward. So far preliminary comparisons to martinus/unordered_dense and skarupke/flat_hash_map (both on github) look promising. — John Yates, Jan 23 '23 at 19:14

score 2 · Accepted Answer · answered Jan 21 '23 at 17:23

No. There is no function in the C Standard Library that

copies overlapping memory ranges
when you know the copy direction at compile time
without having runtime checks for correct copy direction.

But (C++ version)

If you have access to C++, you can use the template function std::copy and std::copy_backward to specify direction at compile time.

But (possible compiler magic)

You may be able to copy/paste each platform's implementation into your own code and rely on the compiler to optimize out the direction checks when the compiler can reason about them at compile time.

But (since you're copying anyway)

If you decide to copy each platform's implementation, you might as well split memcpy into a my_memcpy_forward and my_memcpy_backward functions that omit the runtime check.

As always

Premature optimization is the root of all evil, so profile your code to make sure this optimization even matters for your market needs.

Thanks for the pointer. I was unaware of std::copy_backward. In the interim, I had done some testing. I concluded that a two pass algorithm with backwards copy was slower than a single pass, software pipeline loop. — John Yates, Jan 23 '23 at 19:34

Are there optimized versions of memmove for when I know the direction?

1 Answers1