IPv6 connectivity suddenly lost, IPv6 neighbour router status becomes STALE at the same time. How can I avoid it?

Question

I have a VM on a host with bridged networking (hence, with its own MAC address). Both host and VM run CentOS. Their network is managed by simple /etc/sysconfig/network-scripts/ifcfg-enpXsY files which contains the static IP addresses. IPv4 works just fine.

I have assigned an IPv6 address to the VM (the host also has one) which is routed correctly in the data centre. Most connections use IPv4, however (no DNS AAAA entry for the machine yet, still testing IPv6).

When I boot up the VM it has full IPv6 connectivity. However, after a while IPv6 connectivity stops working (IPv6 magic?). I have narrowed to problem down to neighbour (ARP/NDISC cache) data:

IPv6 not working, cannot ping or connect by IPv6 in or out, then I see:

# ip -6 neighbour 
fe80::1 dev enp1s2 lladdr 0c:86:72:2e:04:28 router STALE

Fix/workaround to refresh the cache:

# ip -6 neighbour flush dev enp1s2
# ip -6 neighbour
(empty, as expected)

Then ping6 the host from within the VM to fill the cache:

# ping6 2912:1375:23:9a6c::2
PING 2912:1375:23:9a6c::2(2912:1375:23:9a6c::2) 56 data bytes
64 bytes from 2912:1375:23:9a6c::2: icmp_seq=1 ttl=64 time=2.35 ms
64 bytes from 2912:1375:23:9a6c::2: icmp_seq=2 ttl=64 time=0.468 ms
^C
# ip -6 neighbour
fe80::1 dev enp1s2 lladdr 0c:86:72:2e:04:28 router REACHABLE
2912:1375:23:9a6c::2 dev enp1s2 lladdr 08:21:4b:b7:f8:31 DELAY

IPv6 neighbour/ARP table restored to validity and connectivity is working in and out!

So my questions are:

Why does the cache become stale?
What can I do to avoid it?'
Why/how does the command above fix it?

Of course I could run those commands in a cron job (how often?) but I suppose that cannot really be needed for IPv6 to work in general?

PS: I used a script for tests: The IPv6 stack breaks down about every 20 minutes. Can that be explained by RFCs?

PPS: Firewall config (shortened output, hopefully all relevant bits):

# ip6tables -nvL
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 9023  709K ACCEPT     icmpv6    !lo    *       ::/0                 ::/0                
Chain OUTPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 9360  785K ACCEPT     icmpv6    *      !lo     ::/0                 ::/0

So, ICMPv6 accepted in/out on the VM. Do I need to check filtering on the host?

Have you set up some firewall rules that would block ICMPv6 (including NDP)? — Håkan Lindqvist, Sep 22 '21 at 07:15
@HåkanLindqvist Thanks, probably not filtered, added `ip6tables` output. — Ned64, Sep 22 '21 at 10:44
There are some details missing: Who is the datacentre? How is bridged networking set up in your hypervisor (and what is it)? What is the _complete_ state of the IPv6 firewall? — Michael Hampton, Sep 22 '21 at 13:11

score 1 · Answer 1 · answered Jan 14 '22 at 08:49

1

Generally Stale State is a good thing, actually its acceptable we have a Stale State.

Let's look at RFC 4861, section 5.1. :

  STALE       The neighbor is no longer known to be reachable but until traffic is sent to the neighbor, no attempt should be made to verify its reachability.

The neighbor is no longer known to be reachable (timer expired, no traffic lately, whatever) and reachability will be 'verfied' once traffic is sent to the neighbor again.

So there isn't any issue if you can send traffic to the neighbor again.

answered Jan 14 '22 at 08:49

Omid Estaji

213
1
3
11

Thanks. My issue is that the host loses IPv6 connectivity - it stops replying to incoming requests when the STALE state is shown. I suppose both issues have the same cause? How exactly do the `ip r` and `ping` commands I detailed in my Question restore connectivity? – Ned64 Jan 14 '22 at 12:37
The problem is that I CANNOT send traffic to the neighbour without these extra steps and the host cannot be reached from the outside, either. So: How and why is connectivity lost? How do the `ip -6 neighbour` and `ping` commands restore connectivity? Can this be done without these steps? Do all IPv6 machines execute these steps periodically, or why do they not lose connectivity? – Ned64 Jan 14 '22 at 15:47
Stale State is a normal part of IPv6 neighbor discovery. When a host boots, it does somethings and also a gratuitous Neighbor Advertisment but we won't accept a new entry upon we see a gratuitous NA. This differs to legacy protocol(IPv4), anytime a gratuitous ARP we see, we did actually an entry to cache. here in IPv6 we only add the entry to cache if we updating an existing entry with that gratuitous NA. so Stale means Not currently communicating, waiting for next queued packet. then if we send a packet, the state may go into a Delay, and will wait for upper layer protocols to return traffic. – Omid Estaji Jan 15 '22 at 16:38

IPv6 connectivity suddenly lost, IPv6 neighbour router status becomes STALE at the same time. How can I avoid it?

1 Answers1