1

I am working in a very simple case for ipsec and I keep getting XfrmInTmplMismatch error in reception (after decapsulating the ESP packet) when checking cat /proc/net/xfrm_stat. nft monitor all shows nothing.

These are the SAs and SPs I set:

[root@b7a933eb94dd /]# ip -s xfrm state 
src 172.20.0.6 dst 172.18.0.6
    proto esp spi 0x000004d2(1234) reqid 0(0x00000000) mode tunnel
    replay-window 0 seq 0x00000000 flag  (0x00000000)
    mark 0x1234/0xffffffff output-mark 0x1234/0xffffffff
    auth-trunc hmac(sha256) 0x0123456789abcdef0123456789abcdef (128 bits) 96
    enc cbc(aes) 0xfedcba9876543210fedcba9876543210 (128 bits)
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 uid 0
    lifetime config:
      limit: soft (INF)(bytes), hard (INF)(bytes)
      limit: soft (INF)(packets), hard (INF)(packets)
      expire add: soft 0(sec), hard 0(sec)
      expire use: soft 0(sec), hard 0(sec)
    lifetime current:
      0(bytes), 0(packets)
      add 2022-08-06 14:45:00 use -
    stats:
      replay-window 0 replay 0 failed 0
src 172.18.0.6 dst 172.20.0.6
    proto esp spi 0x000004d2(1234) reqid 0(0x00000000) mode tunnel
    replay-window 0 seq 0x00000000 flag  (0x00000000)
    mark 0x1234/0xffffffff output-mark 0x1234/0xffffffff
    auth-trunc hmac(sha256) 0x0123456789abcdef0123456789abcdef (128 bits) 96
    enc cbc(aes) 0xfedcba9876543210fedcba9876543210 (128 bits)
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 uid 0
    lifetime config:
      limit: soft (INF)(bytes), hard (INF)(bytes)
      limit: soft (INF)(packets), hard (INF)(packets)
      expire add: soft 0(sec), hard 0(sec)
      expire use: soft 0(sec), hard 0(sec)
    lifetime current:
      1760(bytes), 16(packets)
      add 2022-08-06 14:44:51 use 2022-08-06 14:46:20
    stats:
      replay-window 0 replay 0 failed 0
[root@b7a933eb94dd /]# ip -s xfrm policy
src 0.0.0.0/0 dst 0.0.0.0/0 uid 0
    dir fwd action allow index 23930 priority 0 ptype main share any flag  (0x00000000)
    lifetime config:
      limit: soft (INF)(bytes), hard (INF)(bytes)
      limit: soft (INF)(packets), hard (INF)(packets)
      expire add: soft 0(sec), hard 0(sec)
      expire use: soft 0(sec), hard 0(sec)
    lifetime current:
      0(bytes), 0(packets)
      add 2022-08-06 14:54:07 use 2022-08-06 15:05:09
    mark 0x1234/0xffffffff 
    tmpl src 172.18.0.6 dst 172.20.0.6
        proto esp spi 0x000004d2(1234) reqid 0(0x00000000) mode tunnel
        level required share any 
        enc-mask ffffffff auth-mask ffffffff comp-mask ffffffff
src 0.0.0.0/0 dst 0.0.0.0/0 uid 0
    dir out action allow index 23921 priority 0 ptype main share any flag  (0x00000000)
    lifetime config:
      limit: soft (INF)(bytes), hard (INF)(bytes)
      limit: soft (INF)(packets), hard (INF)(packets)
      expire add: soft 0(sec), hard 0(sec)
      expire use: soft 0(sec), hard 0(sec)
    lifetime current:
      0(bytes), 0(packets)
      add 2022-08-06 14:54:06 use -
    mark 0x1234/0xffffffff 
    tmpl src 172.20.0.6 dst 172.18.0.6
        proto esp spi 0x000004d2(1234) reqid 0(0x00000000) mode tunnel
        level required share any 
        enc-mask ffffffff auth-mask ffffffff comp-mask ffffffff
src 0.0.0.0/0 dst 0.0.0.0/0 uid 0
    dir in action allow index 23912 priority 0 ptype main share any flag  (0x00000000)
    lifetime config:
      limit: soft (INF)(bytes), hard (INF)(bytes)
      limit: soft (INF)(packets), hard (INF)(packets)
      expire add: soft 0(sec), hard 0(sec)
      expire use: soft 0(sec), hard 0(sec)
    lifetime current:
      0(bytes), 0(packets)
      add 2022-08-06 14:54:06 use 2022-08-06 15:05:09
    mark 0x1234/0xffffffff 
    tmpl src 172.18.0.6 dst 172.20.0.6
        proto esp spi 0x000004d2(1234) reqid 0(0x00000000) mode tunnel
        level required share any 
        enc-mask ffffffff auth-mask ffffffff comp-mask ffffffff


I have been adding nft rules with counters at different stages following https://thermalcircle.de/lib/exe/fetch.php?media=linux:packet-flow-ipsec-tunnel.png, and I am quite sure the ESP packet gets in and it is decaped, check the SA counters:

src 172.18.0.6 dst 172.20.0.6
    proto esp spi 0x000004d2(1234) reqid 0(0x00000000) mode tunnel
    replay-window 0 seq 0x00000000 flag  (0x00000000)
    mark 0x1234/0xffffffff output-mark 0x1234/0xffffffff
    auth-trunc hmac(sha256) 0x0123456789abcdef0123456789abcdef (128 bits) 96
    enc cbc(aes) 0xfedcba9876543210fedcba9876543210 (128 bits)
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 uid 0
    lifetime config:
      limit: soft (INF)(bytes), hard (INF)(bytes)
      limit: soft (INF)(packets), hard (INF)(packets)
      expire add: soft 0(sec), hard 0(sec)
      expire use: soft 0(sec), hard 0(sec)
    lifetime current:
      1760(bytes), 16(packets)
      add 2022-08-06 14:44:51 use 2022-08-06 14:46:20
    stats:
      replay-window 0 replay 0 failed 0

Then the decaped pkt is recirculated, it passes prerouting and routing and before getting to Forward it fails the lookup of the fwd and/or in SPs because the template does not match the previous SA.The packet in question should reach output in this case.

I have checked that when the pkt arrives to this policy lookup stage it comes properly marked. In addition to the use of output-mark in the SA I have added adhoc nft rules to mark it and I checkd these rules are hit using nftrace.

My understanding is that output-mark is not part of the hash key used to find the SA from the SP template, so that should not be a problem.

Am I missing something here?

Thanks in advance.

1 Answers1

0

Ok, so I figured out what was happening, sort of and it was related with my environment. The decaped packet was routed goingg out my docker's eth0 only to reach the destination docker and come back via the same eth0 to be routed and go out again through eth0 to a different destination. In this last step it was matching the output policy, but the state lookup failed, which was correct because it was not a an esp decaped pkt at this point. Being more restrict on the the out policy's template fixed the problem.