Why?
Between the two snippets, I changed two things:
- I used the TOS IP header instead of firewall marked, managed internally by the kernel and its modules.
- I marked the returning packets.
I lied (by omission): rp_filter
I forgot to say that on all interfaces, the rp_filter
is set to 1.
According to the kernel documentation, the value 1 stands for a strict reverse path checking as defined in the RFC 3704.
To summarize, when a packet comes into an interface, the kernel swap both source
and destination
IP address fields, and try to route this new fake packet. If the chosen route goes out through the interface where the packet comes from, the check is ok. Otherwise, the packet is dropped.
So, according to What should work
, since the incoming packet is not marked with 1
, the strict reverse path checking fails. Indeed, the returning packet comes through eth_livebox
, but without mark, it is routed according to the main
table, which says to go through eth_adsl
. It is a failure. This is the reason of the change no. 2.
Why TOS and not MARK ?
Yes, of course, I tried -j MARK
on returning packets. And this is not working. After some hours of digging old mailing-lists messages, I found this one:
OK, looking at fib_validate_source(), it looks like how rp_filter
works is just that the kernel takes the packet, reverses src & dst
addrs and interfaces, and tries to do a routing lookup. It totally
ignores marking when building the routing key, but weirdly enough,
it does check the TOS.
OOOOOK. So I read some documentation about TOS, and since I'm still looking for a solution, I do it quick and dirty. It works. This is the reason of the change no. 2.
Can it be better?
I let you check the code of fib_validate_source()
. Honestly, it's too heavy for me.
But in my opinion, the result is inconsistent. I know that TOS
is inside the IP
header, and that firewall marks are specific to host internals. And on the other side ip rule
has a syntax to look for a route either on the TOS
header value or on the firewall mark value with fwmark
.
I don't know what I really should do for now, and here are my conclusions, non exclusive.
Give up rp_filter
on public interfaces
The goal of rp_filter
is to avoid DDoS, but also to filter rogue clients that forge packets directly within my own managed network. It is a bit like SPF, it protects other actors.
On my public interfaces, I obviously have a routing entry like default via IP
, so anyway, the rp_filter
will conclude that the packet can be answered. Indeed, if a packet arrives until my router, well it's because my ISP let it through. And they managed to route it.
So I could give up and set rp_filter
to 0 on all those interfaces (warning: the maximal value between net.ipv4.conf.eth_livebox.rp_filter et net.ipv4.conf.all.rp_filter is applied).
EDIT: User rpfilter
from iptables
Someone on LinuxFR brought my attention to this: the rp_filter
control is deprecated, or at least in an abandoned state. There is indeed a rpfilter
module for iptables
, which is the future of it. As an example, taken from here:
iptables -A PREROUTING -t raw -m rpfilter --invert -j DROP
ip6tables -A PREROUTING -t raw -m rpfilter --invert -j DROP
It is well integrated in the firewall, it works, and returning packets don't even need to be marked, since they are recognized by their state.
Report this "bug" to kernel developers
It seems very inconsistent to me, and moreover very badly documented. On one hand, ip rule
let you make rules that work for incoming packets, but not for returning ones: misbehavior.
But here I am: I don't have the time to get skilled enough to read this code, understand it, and try to fix it.
And I don't even know if there is a good reason for that, like the fact that firewall marks are maybe not available when calling fib_validate_source
.
But if someone here tells me that it could be reported to someone who cares, or explains, and maybe fix and improve, I will gladly do it.
EDIT: Maybe the documentation of the rp_filter
parameter should be updated…