0

My motivation for this is to write some Z80 Assembly code to sort the TI-83+ series' Variable Allocation Table (VAT), but I am also interested in this as a general problem.

The part of the VAT that I want to sort is arranged in contiguous memory with each element comprised of some fixed-size data, followed by a size byte for the name, then the name. To complicate matters, there are two stacks located on either side of the VAT, offering no wiggle room to safely pad it with allocated RAM.

Ideally, I'd want to use O(1) space as I have ready access to 2 768-byte non-user RAM buffers. I also want to make it fast as it can contain many entries and this is a 6MHz processor (effectively 1MIPS, though-- no instruction pipeline). It's also important to note that each entry is at least 8 bytes and at most 15 bytes.

The best approach that I've been able to think up relies on block memory transfers which aren't particularly fast on the Z80. In the past others have implemented an insertion sort algorithm, but it wasn't particularly efficient. As well, while I can (and have) written code to collect into an array and sort the pointers to all of the entries, it requires variable amounts of space, so I have to allocate user RAM which is already in short supply.

I feel like it vaguely reminds me of some combinatorial trick I came across once, but for the life of me, a good solution to this problem has evaded me. Any help would be much appreciated.

Zeda
  • 382
  • 4
  • 13

1 Answers1

1

Divide the table into N pieces which each piece is small enough to be sorted by your existing code using the fixed size temporary buffers available. Then perform a merge sort on the N lists to produce the final result.

Instead of an N-way merge it may be easiest to sort the N pieces pairwise using 2-way merges.

When sorting each piece it may be an advantage to use hash codes to avoid string comparisons. Seems like radix sorting might provide some benefit.

For copying data the Z-80's block move instructions LDIR and LDDR are fairly expensive but hard to beat. Unrolling LDIR into a series of LDI can be faster. Pointing the stack pointer at source and destination and using multiple POP and then PUSH can be faster but requires interrupts be disabled and a guarantee of no non-maskable interrupts occurring.

George Phillips
  • 4,564
  • 27
  • 25
  • I don't fully understand how I can ensure that this will use O(1) space. It did remind me of an in-place, non-recursive mergesort that I came up with a few years ago, though, and I've been working on implementing it for this use. Unfortunately, it is O(n^2) in the worst case (move (n^2+6n-7)/3 elements), but after analyzing it, the worst case is better than insertion sort (move (n^2-3n+2)/2 elements) after exactly 20 elements. As well, best case is O(n*log(n)) instead of O(n^2). I made an error in my average case estimate. – Zeda Sep 28 '18 at 14:25