1

EDIT: Solution found. Will post after synthesization and verification.

Having come across some head-scratching behavior with nftables, I am hoping for some community insights.

When using the below ruleset in a QEMU-KVM guest, Ethernet frames in chain arp-out-host-wan0 that nftables should accept are not recognized. Internal nftables logging shows the frames as, from what I can tell, nonsense, while VM guest tcpdump and VM host Wireshark verifies the frames being as expected.

table arp filter {
    chain input {
        type filter hook input priority filter; policy drop;
        iifname "enp1s0" counter packets 5 bytes 140 jump arp-in-host-wan0
        log prefix "nft: arp->input dropped: " flags all limit rate 3/second
        counter packets 0 bytes 0
    }

    chain output {
        type filter hook output priority filter; policy drop;
        oifname "enp1s0" counter packets 5 bytes 210 jump arp-out-host-wan0
        log prefix "nft: arp->output dropped: " flags all limit rate 3/second
        counter packets 0 bytes 0
    }

    chain arp-in-host-wan0 {
        ether daddr 52:54:00:ee:10:e6 limit rate 3/second counter packets 5 bytes 140 accept
        ether daddr ff:ff:ff:ff:ff:ff limit rate 3/second counter packets 0 bytes 0 accept
        counter packets 0 bytes 0 return
    }

    chain arp-out-host-wan0 {
                
        ### Broken rule not matching frames that should match
        ether saddr 52:54:00:ee:10:e6 limit rate 3/second counter packets 0 bytes 0 accept
                
        ### Wildcard rule to log non-matching frames in chain
        log prefix "nft: arp->output ALLOWED: " flags all

        ### Wildcard rule to let non-matching traffic pass
        counter packets 5 bytes 210 accept

        counter packets 0 bytes 0 return
    }
}

QEMU-KVM guest (journalctl -k), traffic as picked up by the above nftables wildcard rule (seemingly nonsense, pay attention to non-standard EthrType, ARP HTYPE, ARP PTYPE, ARP OPCODE): enter image description here

QEMU-KVM guest (tcpdump arp -vlenx), the very same traffic recognized properly: enter image description here

QEMU-KVM host (Wireshark capture from relevant bridge-device), the very same traffic recognized properly: enter image description here

QEMU-KVM guest info: enter image description here

I would love to understand what is going on here. If there is any additional info I can provide, just let me know - thanks!

UPDATE 1: Same behavior is observed when logging with NFLOG (syntax "log group").

Frame 1: 76 bytes on wire (608 bits), 76 bytes captured (608 bits) on interface nflog:30, id 0
    Interface id: 0 (nflog:30)
        Interface name: nflog:30
    Encapsulation type: NFLOG (141)
    Arrival Time: Jun 28, 2022 09:18:44.633102000 CDT
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1656425924.633102000 seconds
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 76 bytes (608 bits)
    Capture Length: 76 bytes (608 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: nflog:arp]
Linux Netfilter NFLOG
    Family: ARP (3)
    Version: 0
    Resource id: 30
    TLV Type: NFULA_PACKET_HDR (1), Length: 8
        Length: 8
        .000 0000 0000 0001 = Type: NFULA_PACKET_HDR (1)
        HW protocol: ARP (0x0806)
        Netfilter hook: Local in (1)
    TLV Type: NFULA_PREFIX (10), Length: 5
        Length: 5
        .000 0000 0000 1010 = Type: NFULA_PREFIX (10)
        Prefix: 
    TLV Type: NFULA_IFINDEX_OUTDEV (5), Length: 8
        Length: 8
        .000 0000 0000 0101 = Type: NFULA_IFINDEX_OUTDEV (5)
        IFINDEX_OUTDEV: 2
    TLV Type: NFULA_PAYLOAD (9), Length: 46
        Length: 46
        .000 0000 0000 1001 = Type: NFULA_PAYLOAD (9)
Address Resolution Protocol
    Hardware type: Unknown (21076)
    Protocol type: Unknown (0x0064)
    Hardware size: 98
    Protocol size: 108
    Opcode: Unknown (21076)
[Malformed Packet: ARP/RARP]
    [Expert Info (Error/Malformed): Malformed Packet (Exception occurred)]
        [Malformed Packet (Exception occurred)]
        [Severity level: Error]
        [Group: Malformed]
Tanel Rebane
  • 161
  • 1
  • 7
  • 1
    Do you get the same result if you copy the packets via NFLOG (where you currently have `log prefix`, log to a group, e.g. `log group 30` then read the packets in *shark via `-i nflog:30`)? – anx Jun 28 '22 at 01:33
  • I do indeed; "Malformed Packet". I have added the output to the original post. The only difference being that while regular "log" gets lit up twice, NFLOG ("log group") only lits up once. – Tanel Rebane Jun 28 '22 at 14:38
  • 1
    For the record, `21076` is `0x5254`, `98` is `0x62`, and `108` is `0x6c`. Together with `0x0064`, they are `52:54:00:64:62:6c`. – Tom Yan Jun 28 '22 at 15:10
  • Whats the `-netdev/-device` combo in use? Is the host on the same kernel version? I wonder if the wrong offset is caused by extra headers for features that could be explicitly disabled on qemu cmdline. – anx Jun 29 '22 at 05:00

1 Answers1

1

Probably the reason is that arp family tables are traversed before the arp packets are routed (not sure if that's a right way to put it but the point is that the source address of the Ethernet frame has not been set yet).

To match the Ethernet source address of the ARP packets, you should match with arp saddr ether instead of ether saddr. (In the case of inbound traffics, you might even want to check addresses in both the Ethernet frame header and the ARP packet header.)

Tom Yan
  • 747
  • 3
  • 9
  • I think you're right, I reached the same conclusion after inspecting the logs via NFLOG. However, to be sure I injected similar rules into the `ipv4 output` hook and the `ipv4 postrouting` hook. The results are interesting and led to a workaround for filtering L2 traffic. I'll be sharing it here as soon as I get time synthesize the output and verify the workaround. I absolutely agree that `arp saddr {ip|ether}` can be useful (e.g. on a virtual bridge for guests beyond your control) but it isn't a full on replacement for `ether saddr`. – Tanel Rebane Jun 28 '22 at 17:05
  • Actually before I wrote this answer, I didn't pay much attention to the "malformed" details you mentioned. After realizing `52:54:00:64:62:6c` (which is the destination MAC address, right?) is treated as the beginning of the ARP packet, it looks to me like a kernel bug/regression that it is somehow not treating only the Ethernet frame payload as ARP packet but also the Ethernet frame header (i.e. wrong offset), starting from the destination MAC at least. The other `21076` (`52:54`) in the OPCODE field appears to be the first two octets of the source MAC address of the Ethernet frame. – Tom Yan Jun 29 '22 at 01:36
  • And if my new theory is right, then my original theory is probably wrong. – Tom Yan Jun 29 '22 at 01:38
  • Also my point wasn't that `ether saddr` should be replaced (at least not if it is possible to use). I merely meant to point out that the fact that `ether saddr|daddr` are not matching the "SHA"/"THA" field of the ARP packet but the source/destination address in the Ethernet frame header, and AFAIK it is *possible* that they are not identical, (I mean like if someone manipulate only one of them on purpose, for malicious/spoofing reason or whatever), which was why I mentioned that you *might* want to check *both* for at least inboud traffics. – Tom Yan Jun 29 '22 at 01:46