My motivation for this is to write some Z80 Assembly code to sort the TI-83+ series' Variable Allocation Table (VAT), but I am also interested in this as a general problem.
The part of the VAT that I want to sort is arranged in contiguous memory with each element comprised of some fixed-size data, followed by a size byte for the name, then the name. To complicate matters, there are two stacks located on either side of the VAT, offering no wiggle room to safely pad it with allocated RAM.
Ideally, I'd want to use O(1) space as I have ready access to 2 768-byte non-user RAM buffers. I also want to make it fast as it can contain many entries and this is a 6MHz processor (effectively 1MIPS, though-- no instruction pipeline). It's also important to note that each entry is at least 8 bytes and at most 15 bytes.
The best approach that I've been able to think up relies on block memory transfers which aren't particularly fast on the Z80. In the past others have implemented an insertion sort algorithm, but it wasn't particularly efficient. As well, while I can (and have) written code to collect into an array and sort the pointers to all of the entries, it requires variable amounts of space, so I have to allocate user RAM which is already in short supply.
I feel like it vaguely reminds me of some combinatorial trick I came across once, but for the life of me, a good solution to this problem has evaded me. Any help would be much appreciated.