Seeking algorithm for "dual-modulo array memmove"

Question

As the algorithm being requested is somewhat special, I'm trying to explain what I want first, then I'll ask concrete questions that I'm quite unsure about:

Introduction

The algorithm is not C-specific, but I'll start with a C-example, namely memmove(): Memmove can copy a region of memory to another, handling overlaps correctly. Basically, it means that if the source-destination address is lower than the source address, the first item will be copied and the index increases; otherwise the last item is copied, and the index decreases. So the ordering of the memory regions is important.

Problem

I want to move regions in an array-based ring buffer, that is a region may wrap at the end of the array, making it a lot harder to decide which region has a lower address. To make things even more complicated the actual modulo value being used for source and destination may be different. Let's go concrete, and assume this C prototype (just for clarity):

mod_move(item_t items[], unsigned s_i, unsigned d_i, unsigned s_mod, unsigned d_mod, unsigned len);

items[] holds the data items. s_i is the source index. d_i is the destination index. s_mod is the modulo value for s_i. d_mod is the modulo value for d_i. len is the length of the region to copy.

Illustration

Illustrating the blocks (free, f1, f2, f3, and f4 are free blocks; source, s1, and s2 are source blocks; dest, d1, and d2are destinations blocks;..` is indicating some random width):

A block in a ring buffer can be either (A) one contiguous range,
or (B) two contiguous ranges wrapped at the end:
   start   start+len         (start+len)%size  start
     v         v                     v           v     
+----+---------+----+           +----+-----------+----+
| f1 |  used   | f2 |           | u2 |   free    | u1 |
+----+---------+----+           +----+-----------+----+

So for source and destination, there are basically four cases:
AA: Use memmove() algorithm
+-..-+--------+-..-+
| f1 | source | f2 |
+-..-+--------+-..-+
+-..-+--------+-..-+
| f3 |  dest  | f4 |
+-..-+--------+-..-+

AB: ?
+-..-+--------+-..-+
| f1 | source | f2 |
+-..-+--------+-..-+
+-..-+--....--+-..-+
| d2 |  free  | d1 |
+-..-+--....--+-..-+

BA: ?
+-..-+--....--+-..-+
| s2 |  free  | s1 |
+-..-+--....--+-..-+
+-..-+--------+-..-+
| f1 |  dest  | f2 |
+-..-+--------+-..-+

BB: ?
+-..-+--....--+-..-+
| s2 |  free1 | s1 |
+-..-+--....--+-..-+
+-..-+--....--+-..-+
| d2 |  free2 | d1 |
+-..-+--....--+-..-+

Questions

How could a function int overlaps(unsigned s_i, unsigned d_i, unsigned s_mod, unsigned d_mod, unsigned len) be written that returns != 0 if the source region (described by s_i, s_mod, and len) overlaps the destination region (described by d_i, d_mod, and len)? The idea is to optimize mod_move() for non-overlapping regions, possibly using memcpy() (or equivalent). My first guess is like s_i + len > d_i || d_i + len > s_i, but I feel it's incomplete.
When deciding whether to use an incrementing or a decrementing index, I need to find out whether the source region or the destination region has the lower address; does that make any sense in the case of modulo?
If there's more than one item to copy it is intended to use either memmove() or memcpy(), and I wonder how many non-contiguous blocks could exist: Four at most?
Assuming that len is at most the minimum of s_mod and d_mod, is it guaranteed that an algorithm exists that does not lose any items?
The primitive algorithm I would like to replace would be like this:

if (index_increases) {
    while (len-- > 0) {
        items[d_i = (d_i + 1) % d_mod] = items[s_i = (s_i + 1) % s_mod];
    }
} else {
    s_i = (s_i + len - 1) % s_mod;
    d_i = (d_i + len - 1) % d_mod;
    while (len-- > 0) {
        items[d_i = (d_i + d_mod - 1) % d_mod] = items[s_i = (s_i + s_mod - 1) % s_mod];
    }
}

Is there any special case to consider that I forgot to mention?

Illustration of the Problem (Failed Example)

Consider this array with numbers ([23]133 means "the item at index 23 is 133"):

[0]110|[1]111|[2]112|[3]113|[4]114|[5]115|[6]116|[7]117|[8]118|[9]119|[10]120|[11]121|[12]122|[13]123|[14]124|[15]125|[16]126|[17]127|[18]128|[19]129|[20]130|[21]131|[22]132|[23]133|[24]134|[25]135|[26]136|[27]137|[28]1|[29]1|[30]1|[31]1|[32]1|[33]1|[34]1|[35]1|[36]1|[37]1|[38]101|[39]102|[40]103|[41]104|[42]105|[43]106|[44]107|[45]108|[46]109

And the operation desired is mod_move(s_i=44, s_mod=47, d_i=0, d_mod=44, len=31), meaning "move 31 items starting from position 44 (modulo 47) to position 0 (modulo 44)".

Here is an attempt that failed (DDD: [d] = [s] (n) meaning "debug message: array element at s containing n was copied to element d"):

DDD: [0] = [44] (107)
DDD: [1] = [45] (108)
DDD: [2] = [46] (109)
DDD: [3] = [0] (107)
DDD: [4] = [1] (108)
DDD: [5] = [2] (109)
DDD: [6] = [3] (107)
DDD: [7] = [4] (108)
DDD: [8] = [5] (109)
DDD: [9] = [6] (107)
DDD: [10] = [7] (108)
DDD: [11] = [8] (109)
DDD: [12] = [9] (107)
DDD: [13] = [10] (108)
DDD: [14] = [11] (109)
DDD: [15] = [12] (107)
DDD: [16] = [13] (108)
DDD: [17] = [14] (109)
DDD: [18] = [15] (107)
DDD: [19] = [16] (108)
DDD: [20] = [17] (109)
DDD: [21] = [18] (107)
DDD: [22] = [19] (108)
DDD: [23] = [20] (109)
DDD: [24] = [21] (107)
DDD: [25] = [22] (108)
DDD: [26] = [23] (109)
DDD: [27] = [24] (107)
DDD: [28] = [25] (108)
DDD: [29] = [26] (109)
DDD: [30] = [27] (107)

Another Example

Source size is 35, source position is 27. Destination size is 27, destination position is 0. Length is 27.

Original values were:

[0]109|[1]110|[2]111|[3]112|[4]113|[5]114|[6]115|[7]116|[8]117|[9]118|[10]119|[11]120|[12]121|[13]122|[14]123|[15]124|[16]125|[17]126|[18]127|[19]128|[20]129|[21]130|[22]131|[23]132|[24]133|[25]134|[26]135|[27]101|[28]102|[29]103|[30]104|[31]105|[32]106|[33]107|[34]108

Incrementing the indices would lead to:

DDD: [0] = [27] (101)
DDD: [1] = [28] (102)
DDD: [2] = [29] (103)
DDD: [3] = [30] (104)
DDD: [4] = [31] (105)
DDD: [5] = [32] (106)
DDD: [6] = [33] (107)
DDD: [7] = [34] (108)
DDD: [8] = [0] (101)
DDD: [9] = [1] (102)
...

[8] should have gotten value 109, originally stored in [0]. So you get the idea.

score 0 · Answer 1 · answered Jan 10 '22 at 21:24

Finally I succeeded with the algorithm (after the fifth basic re-write): In the attempts before I had solved about 95% of the cases, but the cases left were highly complex, so I inverted the decision tree.

Before doing the actual resize I had to distinguish these three cases:

There is enough capacity to keep (and move) all items
Items have to be dropped at the tail of the queue
Items have to be dropped at the head of the queue

In addition after handling (2) or (3) more sub-cases could be possible. I also had realized that I would not be able to find the "dual-modulo array memmove" algorithm that would correctly move overlapping ranges using two modulo numbers.

Instead of distinguishing the configuration cases first, and then handle the cases (1) through (3), I added code to drop extra items first, as that may alter the basic configuration cases to be handled.

After that I had four cases (from which three possibly could be handled as one):

The first three cases are the non-wrapped ones, namely:

A) The allocated items are neither at the start, nor at the end of the array

B) The allocated items are at the end of the array

C) The allocated items are at the beginning of the array
The final case is the wrapped one:

D) The allocated items wrap at the end of the array

Cases A), B), and C) finally have three sub-cases each, while D) finally has two sub-cases.

Sub-cases in the image are named A1 to A5, B1 to B3, C1 to C3, and D1 to D4. The cases A0, B0, C0, and D0 need no action (size not changed). X (orange) are the items to remove, green are the new array elements and red are the array elements to remove. White elements are not used, while gray items are used, but not changed.

Altogether that makes 11 cases to handle. As said before some further reduction seems possible.

Finally I'd like to note that there is one case where all elements have to be moved in place, but I found an algorithm that only needs one extra element.

Simulation

The verification of my algorithm was done though random simulation (All the algorithms before failed at some point, but the final one succeeded).

The test algorithms creates a random-sized queue, adds a random number of items and shifts the queue by a random number of positions. For simplicity the first item is always number 101.

Then the test does a number of queue resizes and then checks whether the itemss in the queue are still complete.

The tests for one queue repeat a number of times, just as the test queues are created 1000 times.

For example "cap=63 ipos=6 count=19" is a queue of capacity 63 with 19 items starting at position 6:

d=.|.|.|.|.|.|[h=6]101|[7]102|[8]103|[9]104|[10]105|[11]106|[12]107|[13]108|[14]109|[15]110|[16]111|[17]112|[18]113|[19]114|[20]115|[21]116|[22]117|[23]118|[24]119|[t=25]|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.

(Empty elements are denoted by ., h is the head, t is the tail position)

After resizing to "cap=25" and using 25 items the array is:

d=[0]120|[1]121|[2]122|[3]123|[4]124|[5]125|[h=t=6]101|[7]102|[8]103|[9]104|[10]105|[11]106|[12]107|[13]108|[14]109|[15]110|[16]111|[17]112|[18]113|[19]114|[20]115|[21]116|[22]117|[23]118|[24]119

After resizing to "cap=83" the array is:

d=[0]120|[1]121|[2]122|[3]123|[4]124|[5]125|[h=6]101|[7]102|[8]103|[9]104|[10]105|[11]106|[12]107|[13]108|[14]109|[15]110|[16]111|[17]112|[18]113|[19]114|[20]115|[21]116|[22]117|[23]118|[24]119|[25]120|[26]121|[27]122|[28]123|[29]124|[30]125|[t=31]|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.

And after resizing to 22, dropping at the head the array is:

d=[0]117|[1]118|[2]119|[3]120|[4]121|[5]122|[6]123|[7]124|[8]125|[h=t=9]104|[10]105|[11]106|[12]107|[13]108|[14]109|[15]110|[16]111|[17]112|[18]113|[19]114|[20]115|[21]116