0

I want to write a delay loop in assembly. It shall create a delay of N cycles.

My idea was to create a for loop and iterate over the NOP instruction. In this case, would I have to decrease N by the number of cycles caused by other parts of the program, such as calling the for loop? Moreover, does each iteration of the for loop count as 1 cycle or 2 cycles?

In the best case - does anyone have an implementation of such a delay cycle?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
spadel
  • 998
  • 2
  • 16
  • 40
  • Why don't you try it and see? – Erik Eidt Mar 11 '21 at 18:18
  • Every assembly language instruction takes time. So if this will be a function called by some other location in code, the stack push and pop done by the subroutine call and return will definitely need to be factored in to the timing constant. – RufusVS Mar 11 '21 at 23:27
  • 2
    It depends on the details of the microarchitecture, like 68000 vs. 68010 with it's 2-instruction loop buffer. vs. 68020 with an I-cache. https://en.wikipedia.org/wiki/Motorola_68000_series#Improvement_history. vs. 68060 with 2-wide superscalar pipeline (2 instructions per cycle). – Peter Cordes Mar 12 '21 at 01:12
  • One obvious approach at investigation is to inspect the 68k bogomips loop from an older linux kernel. :-) – oakad Mar 12 '21 at 02:17
  • 2
    Just a single `dbf d0,$-2` should do the job. According to https://wiki.neogeodev.org/index.php?title=68k_instructions_timings#Conditional_instructions each iteration would take 10 cycles. – chtz Mar 12 '21 at 13:12

1 Answers1

3

There is no 68k instruction that would execute in exactly one cycle. Even a simple NOP already takes four cycles - so you will need to adjust your expectations a bit.

The most simple delay loop one can imagine is

       move.w #delay-1,d0
loop:  dbf    d0,loop       ; 10 cycles per loop + 14 cycles for last 
                            ; (branch not taken) 

This will delay delay * 10 number of cycles. Note that delay is word-sized, so the construct is limited to delays between 14 and 65534 cycles. If you want a wider range, you need to use a different construct that uses long word counters:

       move.l  #delay,d0
       moveq.l #1,d1
 loop: sub.l   d1,d0        ; 6 cycles for Dn.l->Dn.l
       bne.s   loop         ; 10 cycles for branch

This eats 16 cycles per iteration. It does, however, accept a long word loop counter.

If you want to increase the achievable delay, you may think about nested delay lops or more complex instructions and addressing mode inside the loop. These two are, however, the shortest possibe delay loops.

tofro
  • 5,640
  • 14
  • 31
  • I assume these timings are from 68000 itself, not accounting for the loop buffer on 68010 or the I-cache on 68020. This also raises the question of whether Easy68k is a cycle-accurate simulator. (And if you're trying to write a game that runs at a constant speed, whether guest cycles correspond to real time, or whether it still depends on host speed. Perhaps @spadel knows the answer from using Easy68k.) – Peter Cordes Mar 13 '21 at 01:57
  • 1
    @PeterCordes The examples use original 68k cycle timings, just like easy68k does. easy68k does not claim to be cycle-exact, but it does keep track of (counts) the number of cycles a real CPU would have spent in a program. – tofro Mar 13 '21 at 08:53
  • I think `sub.l d1,d0` takes 8 cycles (same as `subq.l #1,d0`) – chtz Mar 13 '21 at 14:13
  • @chtz Hm. My manuals say "6". (Original 68000 Programmer's Reference Manual) – tofro Mar 13 '21 at 16:08
  • Table 8-4 here https://www.nxp.com/docs/en/reference-manual/MC68000UM.pdf has a note `**` which says "The base time of six clock periods is increased to eight if the effective address mode is register direct or immediate". Same info here: https://wiki.neogeodev.org/index.php?title=68k_instructions_timings#Standard_instructions – chtz Mar 13 '21 at 16:18