How to skip/fix NAT on raw socket output

Question

I have a caching application that runs in userspace and provides acceleration services to clients running on external hosts elsewhere in the network. Briefly put, my program watches network traffic and does deep packet inspection to generate accelerated replies to some client requests.

For reasons that are long and boring, I wanted to add some NAT functionality. As a proof of concept, I was hoping to put a front end on my application using iptables/netfilter. Mostly, it works great. I can NAT successfully, and I can forward traffic to my application using NFQUEUEs, allowing it to read them and do packet inspection.

However, when my cache attempts to generate a response back to the client, I have difficulty. I'm trying to feed internally generated response packets to the network via a raw socket so that they are sent back to the client. I find that the packets' TCP source port is being changed. The packet I hand to the raw socket has source port = 2049 (NFS), but what actually comes out has source port = 1024.

Upon further analysis, I suspect that my generated packets are running afoul of netfilter's connection tracking and NAT code. Netfilter thinks they are not part of the connection that I'm injecting them into, but they have the same tuple as that connection. So it thinks it's seeing a collision and does port forwarding. This is obviously a problem, as they're supposed to look to the client like they belong to the same connection.

Is there a good way to just skip the final NAT steps for some packets? Failing that, is there a way to programmatically tell netfilter that my packets belong to particular client connection, even though they come from a raw socket rather than the network?

Can you get rid of the NAT? It will only be a source of heartache and suffering. — Michael Hampton, Sep 21 '20 at 23:03
Sadly, no. I need the NAT capabilities for what I'm trying to accomplish (it solves quite a number of deployment problems for me). Eventually, I'll write my own NAT code. But for the moment, I'm trying to prove the concept out without doing all of that in advance. — SynAckRst, Sep 21 '20 at 23:11

score 2 · Answer 1 · answered Sep 22 '20 at 06:33

2

This is a corner case in the Netfilter code with raw sockets and connection tracking. Netfilter connection tracking feature changes the source port, because the packet sent out via raw socket doesn't somehow match with existing connection tracking.

You can work around the behavior with the following IPTables rule:

iptables -t raw -I OUTPUT -p tcp -j CT --notrack

Another option is to set the following sysctl value:

net.netfilter.nf_conntrack_tcp_loose=0

I don't know about the possible side-effects either of these settings have on other operation. In my case, I haven't noticed anything, but your case might be different.

answered Sep 22 '20 at 06:33

Tero Kilkanen

36,796
3
41
63

That's fantastic, thank you. I presume I can also use hwmark in that iptables command, right? I'm able to mark packets coming from the raw socket (via setsockopt with SO_MARK). Or do I need to have it not track any connections at all to enjoy this behavior? – SynAckRst Sep 22 '20 at 12:24
You should be able to match packet marks in the rule, which is a good thing to limit the rule only to your generated packets. – Tero Kilkanen Sep 22 '20 at 16:41
Yes, matching the marks appear to work. Thanks to both of you for the help. – SynAckRst Sep 22 '20 at 18:50

How to skip/fix NAT on raw socket output

1 Answers1