-1

I've been writing code to interface with a hardware register on SAMV71Q20B. The function in question is:

// Reads current MII link busy status
u32  mii_is_busy(void)
{
    return !GMAC->GMAC_NSR.bit.IDLE;
}

Compiled to Assembly
4B03        ldr r3, =0x40050000
6898        ldr r0, [r3, #8]
F3C00080    ubfx r0, r0, #2, #1
F0800001    eor r0, r0, #1
4770        bx lr
BF00        nop
40050000    .word 0x40050000

Inline version 1
4A05        ldr r2, =0x40050000
6891        ldr r1, [r2, #8]
0749        lsls r1, r1, #29
D1FA        bne 0x0042392C

Inline verison 2
4A08        ldr r2, =0x40050000
6890        ldr r0, [r2, #8]
0740        lsls r0, r0, #29
D507        bpl 0x0042399C

If I change it to

// Reads current MII link busy status
u32 __attribute__ ((noinline)) mii_is_busy(void)
{
    return !GMAC->GMAC_NSR.bit.IDLE;
}

Then it works correctly, the assembly is the same, but there is no inline version

We noticed this because the function used to be:

// Reads current MII link busy status
u32 mii_is_busy(void)
{
    return !(GMAC->GMAC_NSR.reg & GMAC_NSR_IDLE);
} 

Compiled to Assembly
4B03        ldr r3, =0x40050000
6898        ldr r0, [r3, #8]
F0800004    eor r0, r0, #4
F3C00080    ubfx r0, r0, #2, #1
4770        bx lr
BF00        nop
40050000    .word 0x40050000

Which for whatever reason was not getting inlined, and it also works fine

By working fine I mean our entire ethernet stack responds to pings, but making this small change causes the ethernet to no longer function.

The definitions of the registers I am using come from the CMSIS header for the chip. Here is the relevant snippet

/* -------- GMAC_NSR : (GMAC Offset: 0x08) (R/ 32) Network Status Register -------- */
#if !(defined(__ASSEMBLER__) || defined(__IAR_SYSTEMS_ASM__))
#if COMPONENT_TYPEDEF_STYLE == 'N'
typedef union { 
  struct {
    uint32_t :1;                        /**< bit:      0  Reserved */
    uint32_t MDIO:1;                    /**< bit:      1  MDIO Input Status                        */
    uint32_t IDLE:1;                    /**< bit:      2  PHY Management Logic Idle                */
    uint32_t :4;                        /**< bit:   3..6  Reserved */
    uint32_t RXLPIS:1;                  /**< bit:      7  LPI Indication                           */
    uint32_t :24;                       /**< bit:  8..31  Reserved */
  } bit;                                /**< Structure used for bit  access */
  uint32_t reg;                         /**< Type used for register access */
} GMAC_NSR_Type;
#endif
#endif /* !(defined(__ASSEMBLER__) || defined(__IAR_SYSTEMS_ASM__)) */

#define GMAC_NSR_OFFSET                     (0x08)                                        /**<  (GMAC_NSR) Network Status Register  Offset */

#define GMAC_NSR_MDIO_Pos                   1                                              /**< (GMAC_NSR) MDIO Input Status Position */
#define GMAC_NSR_MDIO_Msk                   (_U_(0x1) << GMAC_NSR_MDIO_Pos)                /**< (GMAC_NSR) MDIO Input Status Mask */
#define GMAC_NSR_MDIO                       GMAC_NSR_MDIO_Msk                              /**< \deprecated Old style mask definition for 1 bit bitfield. Use GMAC_NSR_MDIO_Msk instead */
#define GMAC_NSR_IDLE_Pos                   2                                              /**< (GMAC_NSR) PHY Management Logic Idle Position */
#define GMAC_NSR_IDLE_Msk                   (_U_(0x1) << GMAC_NSR_IDLE_Pos)                /**< (GMAC_NSR) PHY Management Logic Idle Mask */
#define GMAC_NSR_IDLE                       GMAC_NSR_IDLE_Msk                              /**< \deprecated Old style mask definition for 1 bit bitfield. Use GMAC_NSR_IDLE_Msk instead */
#define GMAC_NSR_RXLPIS_Pos                 7                                              /**< (GMAC_NSR) LPI Indication Position */
#define GMAC_NSR_RXLPIS_Msk                 (_U_(0x1) << GMAC_NSR_RXLPIS_Pos)              /**< (GMAC_NSR) LPI Indication Mask */
#define GMAC_NSR_RXLPIS                     GMAC_NSR_RXLPIS_Msk                            /**< \deprecated Old style mask definition for 1 bit bitfield. Use GMAC_NSR_RXLPIS_Msk instead */
#define GMAC_NSR_MASK                       _U_(0x86)                                      /**< \deprecated (GMAC_NSR) Register MASK  (Use GMAC_NSR_Msk instead)  */
#define GMAC_NSR_Msk                        _U_(0x86)                                      /**< (GMAC_NSR) Register Mask  */

My question is, does my compiler check out, like is it producing correct Assembly code? because if so I presume this must be some subtilty in the hardware that is causing this. I could just use one of the two working versions I've presented, but I hate to do that without understanding why one works and the other doesn't.

  • 1
    I don't see `volatile` anywhere in your code; that's surely a bug for reading an MMIO address which might change on its own, unlike a normal C object where the compiler can hoist a load out of a loop. Making the function non-inline defeats that kind of optimization, acting like a compiler memory barrier, but use `volatile` instead. Maybe you have `volatile` somewhere in the definition of `GMAC`, though, which you didn't show, since the asm for the inlined copies does appear to be loading and branching. We can't tell from context if it's been hoisted out of a loop, though. – Peter Cordes Jul 18 '23 at 20:38
  • Thanks for your comment, The CMSIS header defines each GMAC register using a "__I" "__O" or "__IO" macro. These are defined based upon the target compiler, but from what I understand they effectively are the same as volatile. – Zachary Vander Klippe Jul 18 '23 at 20:55

1 Answers1

-1

I figured it out, nothing like asking for help, then finding the problem minutes later.

My collogue had written this code:

// Wait for idle status.
// I saw 213 iterations once.
for (i32 i=0; i<500; i++)
    if (!mii_is_busy())
        return ERR_OK;

return ERR_TIMEOUT;

So the failure of the inline versions was due to the faster execution time, causing the loop to run out of iterations.

  • Is there any reason to want to time-out quickly? So you can fall back to a sleep that waits for an interrupt I guess, to save power? But you haven't implemented timeout handling yet I guess. (If you do want some large count so you always spin-wait except in actual failure cases, use some large power of 2 that ARM can use as an immediate efficiently, as a rotated 8-bit immediate.) – Peter Cordes Jul 18 '23 at 20:50
  • We clearly need to revise this code. A hardware timeout isr would be great. At present, I'm happy to understand what the issue was. – Zachary Vander Klippe Jul 18 '23 at 20:58