Packet flow in bottom half

Question

I was reading about packet flow in the receive path from NIC interrupt handler to user space.

I wanted to know till which point does the newly allocated skbuff remain in bottom half context.

Taking the snull_rx() code from LDD:

void snull_rx(struct net_device *dev, struct snull_packet *pkt)
{
    struct sk_buff *skb;
    struct snull_priv *priv = netdev_priv(dev);

    /*
     * The packet has been retrieved from the transmission
     * medium. Build an skb around it, so upper layers can handle it
     */
    skb = dev_alloc_skb(pkt->datalen + 2);
    if (!skb) {
        if (printk_ratelimit(  ))
            printk(KERN_NOTICE "snull rx: low on mem - packet dropped\n");
        priv->stats.rx_dropped++;
        goto out;
    }
    memcpy(skb_put(skb, pkt->datalen), pkt->data, pkt->datalen);

    /* Write metadata, and then pass to the receive level */
    skb->dev = dev;
    skb->protocol = eth_type_trans(skb, dev);
    skb->ip_summed = CHECKSUM_UNNECESSARY; /* don't check it */
    priv->stats.rx_packets++;
    priv->stats.rx_bytes += pkt->datalen;
    netif_rx(skb);
  out:
    return;
}

So after the netif_rx(skb) till what point will the skb remain in bottom half ?.

Thanks.

The netif_rx() is the entry point in the Linux kernel to receive the packets from the network interface drivers. The netif_rx() doesn't do much(should not do much) processing and it just queues the packet and returns. The queued packets will be processed by the kernel in normal context. — toyoubala, Mar 15 '16 at 13:56

Joe Damato · Accepted Answer · 2016-06-22T22:54:16.520

EDIT: I've written a blog post outlining the entire linux network stack (the receive path) that provides lots of detailed information, take a look.

The answer is complicated, but yes the netfilter code runs in the softirq context.

The way the flow works is like this:

a packet arrives, which enables NAPI.
a NAPI kernel thread running in the soft irq context (NET_RX_SOFTIRQ in /proc/softirqs) harvests packets from memory (where the NIC DMA'd the data).
the softirq can only consume up to its budget of packets OR a time limit on packet processing. you can find this code here.
This prevents the softirq from consuming the entire CPU.
Eventually, the function __netif_receive_skb_core is called. The exact path to this function depends on the driver, but for e1000e the path is:
1. NAPI softirq calls e1000e_poll
2. e1000e_poll calls e1000_clean_rx_irq
3. e1000_clean_rx_irq calls e1000_receive_skb
4. e1000_receive_skb calls napi_gro_receive
5. napi_gro_receive calls napi_skb_finish
6. napi_skb_finish calls netif_receive_skb
7. netif_receive_skb calls netif_receive_skb
8. netif_receive_skb calls __netif_receive_skb_core
Depending on whether or not you are using receive packet steering, the code paths here diverge a little bit.
In either case, eventually packets are delivered to the protocol layer here.
If we are looking at IP as our protocol of choice, packets are handed up to ip_rcv which will also check netfilter.
The packet continues through each of the protocol stacks until it is queued to a socket's receive buffer with a call to sock_queue_rcv_skb. For example, UDP does this here from a function called __udp_queue_rcv_skb.
The function sock_queue_rcv_skb queues the data to the socket receive buffer. You can find this code here.

Some notes:

You can adjust the budget for NAPI by changing the sysctl net.core.netdev_budget. The higher the budget, the more packets will be queued to the process' receive queue, but the CPU will have less time for running user processes.
If your NIC supports multiple RX queues, you can distribute the incoming packet processing load between multiple CPUs.
If your NIC does not support multiple RX queues, you can use Receive Packet Steering to distribute the packet processing load across multiple CPUs.

Let me know if you have any other questions about this process that I can answer.

Nice answer but this process is true while you are using NAPI but its not the case all the time. today many processors have the ability to chain interrupts with a very low latency and NAPI is not required in this case — Liran Ben Haim, Apr 06 '16 at 19:08
Take this one for example: http://lxr.free-electrons.com/source/drivers/net/ethernet/allwinner/sun4i-emac.c — Liran Ben Haim, Apr 07 '16 at 06:49
If you search lxr you'll find many drivers that not using NAPI and also if you compare kernel versions you can find drivers that in older version uses NAPI and in the newer not using it. In multi-core processors there is a very little trade-off if using NAPI or not — Liran Ben Haim, Apr 07 '16 at 06:53
The above driver is for ARM architecture and not using PCI at all (not supported). As you mentioned, many drivers uses NAPI but many , more on embedded boards, not using it — Liran Ben Haim, Apr 07 '16 at 09:13
BTW, all winner is a very large processors company with millions of devices around the world based on their chips. many many cheap android devices are based on allwinner (china). Also you can also find an examples from bigger companies like Apple: http://lxr.free-electrons.com/source/drivers/net/ethernet/apple/bmac.c — Liran Ben Haim, Apr 07 '16 at 11:41
Apple started to write it on 1998. The last version is up to date to kernel 4.5. There are more and popular devices that don't implement NAPI — Liran Ben Haim, Apr 07 '16 at 16:07

score 0 · Answer 2 · answered Mar 22 '16 at 13:50

0

The code above runs on hardware interrupt context the netif_rx queue the packet and signal the kernel to continue on SOFTIRQ context (NET_RX)

answered Mar 22 '16 at 13:50

Liran Ben Haim

436
3
10

netif_rx queues the packet in backlog queue? ...If everything was running in SOFTIRQ after that, then no process will ever be able to get control of CPU ?? – Haswell Mar 23 '16 at 10:19
not everything is running in SOFTIRQ on the transport layer (TCP for example) the packet is queued again in socket backlog and the rest is handled in process context see for example the code in tcp_v4_rcv – Liran Ben Haim Mar 23 '16 at 21:03
Can you direct me to the section of code which does the "queued again in software backlog to be later processed in process context.". tcp_v4_rcv_ will run in process context, but I wanted to see the code which does this transition from softirq to process context, particularly to see what locks are getting used. Please help. thanks. – Haswell Apr 06 '16 at 05:17
It's a long process but you can find a good explanation here https://people.cs.clemson.edu/~westall/853/notes/tcprecv.pdf – Liran Ben Haim Apr 06 '16 at 06:42

Packet flow in bottom half

2 Answers2