DNAT without default route

Question

I have a TCP service in a datacenter that is doing filtering and rate limiting based on source IP address. I'd like to move it to another datacenter.

I'd like to provide the same service on an IP address from the new datacenter and forward all traffic on a single port to the old one, so both new and old will work at the same time. I can't just change the hostname, as some clients are using IP address for connecting (sigh) and some are using outgoing connections IP filtering based on IP address (sigh) and it will take weeks to change them.

I know that I can SNAT the connections, but if I do this all the connections will be sourced from the same IP, which conflicts with filtering and rate limiting based on source IP address.

I can DNAT the connections and route them through a VPN tunnel, but this means that the return packets will try to go with the service's default route and service source IP address and will be ignored by the clients.

Is there a way with Linux to somehow mark the TCP packets that were DNAT'ed so the return packets can be routed back through the VPN tunnel instead of the service's default route?

score 2 · Accepted Answer · answered May 24 '19 at 10:31

There are various ways to implement what you want. Draw your network topology.

Simplest way

It requires only single DNAT rule on the S2 (server in the new DC) and additinal routing configuration on the S1 (server in the old DC). But it also requires what your app accept the requests on the VPN tunnel address too.

The S2 server iptables configuration:

iptables -t nat -A PREROUTING \
         -i eth0 --dst <S2.IP> \
         -p tcp --dport <APP.PORT> \
    -j DNAT --to-address <S1.TUN.IP>:<APP.PORT>

Also, you should enable the forwarding on the S2 server (use the sysctl -w net.ipv4.ip_forward=1 command to enable it).

Verification: use the ip route get <S1.TUN.IP> from 8.8.8.8 iif <S2.IFACE> and ip route get 8.8.8.8 from <S1.TUN.IP> iif <S2.TUN.IFACE> command. It should return the valid routes.

The S1 server routing configuration:

ip route add 0/0 dev <TUN.IFACE> table 1
ip rule add from <S1.TUN.IP> lookup 1 pref 1000

LINUX replies on the request from the same ip address, on what request has been received.

Verification: use the ip route get <S1.TUN.IP> from 8.8.8.8 iif <S1.TUN.IFACE> and ip route get 8.8.8.8 from <S1.TUN.IP> commands. It also should return the valid routes. Maybe you will see something like invalid cross-device link. In this case you should tune the rp_filter on the vpn tunnel interface.

Detailed explanation:

Client sends the request in form of <C.IP>:<SOME.PORT> -> <S2.IP>:<APP.PORT>.
S2 server receives this request, rewrites the destination to <S1.TUN.IP>. It happens before routing, so after this step the packet will form of <C.IP>:<SOME.PORT> -> <S1.TUN.IP>:<APP.PORT>.
S2 forwards the rewritten request through VPN tunnel due the routing table.
S1 receives the request through VPN tunnel to <S1.TUN.IP> address.
Your app on S1 serves the request and replies to client with source address <S1.TUN.IP>. The reply is <S1.TUN.IP>:<APP.PORT> -> <C1.IP>:<SOME.PORT>.
By routing rule all packets with source address <S1.TUN.IP> routes by the routing table 1. So, the replied packets from your app will be sent through VPN tunnel to S2 server.
S2 receives the replies, make reverse translation of source address, rewriting it from <S1.TUN.IP> into <S2.IP>. After this reply becomes into <S2.IP>:<APP.PORT> -> <C.IP>:<SOME.PORT>.
The rewritten replies are being forwarded back to client to <C.IP> destination address.
The client receives the reply as expected.

To troubleshoot you can use the tcpdump.

There is other way, that is more complicated. I'll describe it if you need.

There is one issue with this setup while having bad quality connection between client and s1. If significant % of ACK packets can be lost then the client will re-transmit a lot. But conntrack by default will only rewrite packets with sequence numbers that are near of within the current receive window. If s1 will receive a late packet then conntrack will ignore it and s1 will see unmodified packet. It won't recognize the connection and send RST to the client, terminating the connection prematurely. A workaround is to enable `net.netfilter.nf_conntrack_tcp_be_liberal = 1` in sysctl.conf. — Tometzky, Nov 12 '19 at 08:59
[Docker uses DNAT and has similar problems](https://github.com/docker/libnetwork/issues/1090). — Tometzky, Nov 12 '19 at 09:00

score 0 · Answer 2 · answered May 23 '19 at 20:25

This was answered in a different post, though the OP's question was not exactly the same.

How to set mark on packet when forwarding it in nat prerouting table?

The first code block shows how to mark packets that are being DNAT'ed in PREROUTING.

Hope this helps, cheers!

DNAT without default route

2 Answers2

Simplest way