Fast memmove for x86 and +1 shift (for Move-to-front transform)

Question

For fast MTF ( http://en.wikipedia.org/wiki/Move-to-front_transform ) i need faster version of moving a char from inside the array into the front of it:

char mtfSymbol[256], front;
char i;

for(;;) { \\ a very big loop 
    ... 
    i=get_i(); \\ i is in 0..256 but more likely to be smaller.

    front=mtfSymbol[i];
    memmove(mtfSymbol+1, mtfSymbol, i);
    mtfSymbol[0]=front;
}

cachegrind shows, that for memmove there are lot of branch mispredictions here.

For the other version of code (not a memmove in the first example, but this one)

do
{
   mtfSymbol[i] = mtfSymbol[i-1];
} while ( --i );

there are lot of byte reads/writes, conditional branches and branch mispredictions

i is not very big, as it is MTF used for "good" input - a text file after a BWT ( Burrows–Wheeler transform )

Compiler is GCC.

Is there any reason to believe the supplied `memmove` can be improved on? Not knowing what you mean by MTF or BWT, can you avoid doing these moves? — David Thornley, Sep 14 '10 at 16:16
@David Thornley, This is a limited case for the moving. The most common is to move a small part of 256 array. The displacement is fixed and is a +1. Also, this code is hot spot, as it runs fully for every char in 5 GByte file. — osgx, Sep 14 '10 at 16:45
MTF is usually applied when symbols are expected to appear in a temporally coherent manner, so whatever the input is, output will be "small" values (otherwise, using MTF makes no sense). Which means that most of the time, an element which is very close to the beginning needs to be moved to front. You should be able to hardcode special cases for the first 4-8 positions which basically rotate a register and write the resulting bit pattern back. The rest is good using standard `memmove`, since it's hard to do better, and that case doesn't occur often anyway. — Damon, Feb 26 '14 at 17:13
Damon, your comment is the best answer. Please make it an answer so I can vote on it! — Jeff Allen, Aug 14 '14 at 23:31

score 0 · Answer 1 · answered May 19 '13 at 06:19

0

You can also use a dedicated data structure rather than an array to speed up the forward transform. A fast implementation can be built with a list of linked lists to avoid the array element moves altogether.

See http://code.google.com/p/kanzi/source/browse/java/src/kanzi/transform/MTFT.java

For inverse transform, it turns out that arrays are as fast as linked lists.

answered May 19 '13 at 06:19

flanglet

564
4
11

1

This is an academically correct solution, but I doubt it is a good performer in real code (maybe in Java, but surely not in C++). Despite better algorithmic complexity `list` is 3-4 times slower than `vector` (or a raw array) except for very large objects. Also, when one uses MTF, this only makes sense if frequent characters occur clustered, that is moves will usually be very short distances. A custom `memmove` specialized for short moves will therefore perform well. – Damon Feb 19 '14 at 00:25
I agree with the overall comment. However, Java having no direct access to optimized (low level) instructions, the list is the fastest option (on Intel CPUs at least). I make no such claim for other languages. Since the MTF is typically used after a BWT, it is common for most characters to have low indexes. – flanglet May 07 '14 at 19:46
_"typically used after a BWT, it is common for most characters to have low indexes"_ well yes, that is what I'm saying :-) It only makes sense to use MTF in such a context, too. But then, the amount of memory that needs to be moved is necessarily tiny (usually 3-5, rarely up to 8-10 bytes), so a specialized `mmove` will be ultra fast. The overhead of manipulating two pointers in a list is higher, and that does not include the slower overall access and cache implications when _using_ the data. The Java VM admittedly hides this fact (as in your example), but the OP's question is about C. – Damon May 08 '14 at 13:07
Put differently, Java is nice for demonstrating the _academically correct_ solution, but it doesn't really help finding a high-performance solution in C (since Java is a virtual machine that is totally oblivious of how a CPU or how memory works, and thus in Java everything is equally slow). In practice, if one wants performance, using a list is almost always the wrong solution, even if it would seem that it is the right solution (and even if the "nature" of the data and its access pattern suggests list as the correct approach). Interesting benchmark: [vec vs list](http://tinyurl.com/knakou3) – Damon May 08 '14 at 13:10
You are right, the initial request was about C and a vector is expected to a better solution than a list in this context. – flanglet May 08 '14 at 19:04

score 0 · Answer 2 · answered Sep 14 '10 at 16:25

0

If you pre-allocate your buffer bigger than you're going to need it, and put your initial array somewhere in the middle (or at the end, if you're never going to have to extend it that way) then you can append items (up to a limit) by changing the address of the start of the array rather than by moving all the elements.

You'll obviously need to keep track of how far back you've moved, so you can re-allocate if you do fall off the start of your existing allocation, but this should still be quicker than moving all of your array entries around.

answered Sep 14 '10 at 16:25

Andrew Aylett

39,182
5
68
95

Do you know the MTF? i is smaller then 256, so there is a part of array to move, the i'th element which will be moved to front and the long part after i'th which must stay in place. So you suggestion will generate "holes" – osgx Sep 14 '10 at 16:31
2

@osgx: Military Treatment Facility? Manual Transmission Fluid? More To Follow? The most likely looks like Microsoft Tape Format, but there's other possibilities. At least BWT doesn't lead to a Wikipedia disambiguation page. – David Thornley Sep 14 '10 at 16:36
@David Thornley, sorry it is move-to-front transform, used in archivers, e.g. bzip2 – osgx Sep 14 '10 at 16:41
+1 even hough this won't work for the OP's need. Nevertheless this is _generally_ still a good (and not as obvious as one would think, if you haven't heard of it before!) approach for the general "insert at front of array" problem. Some implementations of `deque` used to work like this, if I remember correctly. – Damon Feb 26 '14 at 17:09

Fast memmove for x86 and +1 shift (for Move-to-front transform)

2 Answers2