0

I have created a PCIe driver for Linux v4.1.15 (non PREEMPT_RT) with a single IRQ which is generated from an MSI from an FPGA. My ISR is:

static irq_handler int_handler(int irq, void* dev_id, struct pt_regs* regs)
{
    spin_lock(&my_lock);
    msi_counter++;
    spin_unlock(&my_lock);

    return (irq_handler_t) IRQ_HANDLED;
}

The MSI is sent every 300 us from the FPGA (Cyclone V) and my ISR is fired very quickly and is handled without fail (latency < 5 us). The problem is that about every 3 s (3 s with relatively small jitter) the latency for my ISR jumps to about 1.5 ms to 2 ms; this was measured with a scope by changing my ISR to write a value back to the FPGA and monitoring a pin from the FPGA. The spin_lock for msi_counter is only used in one other place in my code but only to decrement the counter in the same way my ISR increments it. I am using an iMX6 quad core CPU at 1 GHz and the system is using a bare-bones Yocto image (core-image-minimal) so nothing is really running on the CPU. The only other hardware the CPU is connected to is the Ethernet but there is very little data being sent and the data updates more frequently than 3 s.

Questions:

  • How can I identify why Linux periodically increases the latency for my ISR?
  • What can I do to decrease the latency?

Other Info:

  • I've changed the flags passed to request_irq() to IRFQ_NO_SUSPEND | IRFQ_NO_THREAD, as well as other values, and nothing seems to fix this periodic latency increase.

  • Also, when I look at cat /proc/interrupts it shows that only core 0 (first of four cores) is the only core which my ISR is run on. I don't know if this has any meaning but I figure it is worth mentioning.

  • Data from the FPGA to CPU is transferred once per MSI (once per 300 us) and the time to transfer the data is a steady 17 us. No data is ever sent from the CPU to the FPGA.

Final Solution:

I created a PREEMPT_RT kernel image and created my ISR request_irq() with flags IRQF_NO_SUSPEND | IRQF_NO_THREAD | IRQF_PERCPU. I also replaced the spin_lock with an atomic_t, the big improvement came from PREEMPT_RT. I now get an average latency of ~12 us.

user2205930
  • 1,046
  • 12
  • 26
  • You could use `atomic` type to avoid the spinlock if Inc and dec are all you need. Also could try removing the spinlock to see if the latency is there or in the IRQ handler waiting to run. Then consider using ftrace or LTTng to see what was running before the IRQ was handled. – TrentP Feb 11 '17 at 05:17
  • @TrentP - I'll give both of those a shot when I get a chance. Do you think that the `spin_lock` could really be the issue? I thought `spin_lock` was fast and never caused the ISR to sleep. – user2205930 Feb 11 '17 at 22:06

1 Answers1

0

You can add a timer before and after spin_lock. And if the difference in times is over some threshold, increment another time. You'd count how often the delay is due to spin_lock. If it matches how often you see the delay, then you can try to figure out why the other lock holder doesn't release it. Can it get preempted while holding the lock?

Another thing to look into is should spin_lock itself take a long time with a low, but not-zero, probability?

grovkin
  • 158
  • 1
  • 3
  • 9
  • The timer idea is interesting but it wouldn't work for my application since the MSI arrives when data is ready so I need some sort of "handshaking" between the CPU and FPGA when that data is ready. Although I am thinking of using your timer idea to see if `spin_lock` ever gets preempted. I don't know a lot about `spin_lock` but I thought it was immune to preemption. – user2205930 Feb 11 '17 at 22:09
  • I meant that the other thread holding the lock could be getting preempted while holding the lock. As for the other suggestion, the spin_lock is fast on average. But it has a non-zero variance (because it is spinning). Which means that it has a possibility (with low probability) of taking longer than the time you can tolerate. – grovkin Feb 11 '17 at 23:13