This is the problem piece of code.
#define CSSPDR SPI::Mem[2] // Data Recieve / Data Transmit
#define CSSPSR SPI::Mem[3] // SPI Status Register
#define SPI_BUSY ((CSSPSR & 0x10) == 0x10)
#define SPI_READ_BUFF_EMPTY ((CSSPSR & 0x4) == 0)
ushort Comm(ushort value)
{
ulong w1=1000000, w2=1000000;
ulong ret;
CSSPDR = value;
while (SPI_BUSY && (w1>0)){--w1;};
do {
ret = CSSPDR;
}
while(!SPI_READ_BUFF_EMPTY && (--w2>0));
return ret;
}
The above is a bit of problem code for SPI communication - send a value by writing it to a special register, await sending complete, then read the reply, wait until transmission is finished (if there used to be some junk buffered, just overwrite it, the last value arriving is the good one) - but the wait states are implemented as ugly while()
loops with counters.
The remote device takes well under 1ms to process the information and send it back, but there's a lot of data to transfer. It works okay as long as the communication goes without a hitch - usually reply is achieved within a couple hundred iterations, rarely with some noise - several thousands. Very rarely something is lost to a timeout, not a biggie, the faulty readout will be fixed by a good one a couple milliseconds later and filtering functions deal with the glitch.
But if the flexible tape connecting the CPU to the remote device is damaged, the communication dies out, and the timeout variables start counting to their maximum every single time. And the side effect is the entire application grinds to a halt with vast majority of CPU time wasted waiting in these loops. This happens very rarely but the result is quite ugly and I'd much prefer a solution that doesn't break the entire system in case of failure of what is a definitely non-critical part of it.
If I did usleep(1)
I'd never approach the desired throughput as it hands control back to the kernel for a segment of time usually considerably longer than 1ms (it only demands not to be woken up earlier than 1µs, the kernel is free to make the time longer and usually it does, by quite a bit, even up to 100ms if other tasks are busy). Similarly, putting anything 'heavyweight' or otherwise time-consuming inside the loops would slow down the communication unacceptably as the reaction to 'BUSY' bit vanishing would be delayed. A delay of order of 100µs between readouts would be about the most I could afford, maybe optionally with an initial delay of 500µs between sending out the data and start of polling of the BUSY bit (there's zero chance it will clear any earlier.)
The values of 1000000 iterations for timeout were found experimentally as something that "almost never fails". I could try pushing them down, but it doesn't really solve the problem, just reduces the pain a bit (and may have the side effect of breaking communications that just take a while to get across.