0

I've been trying to come up with a solution for a routing problem (multiple interfaces connected to a single Docker container, ensuring response packets go out the right interface), and have come across an interesting observation: while using TRACE to log packets only shows the source IP as being the Docker network interface, tcpdump manages to show the actual source IP address of the attached interface. See below. Can someone tell me where this source address comes from? And a bonus question if someone has an idea, how would I match this source address in an iptables rule (if at all possible)?

Oct 23 09:54:43 <hostname> kernel: [145206.331674] TRACE: raw:PREROUTING:policy:3 IN=br-55939cd46cf5 OUT= PHYSIN=<phys> MAC=<mac> SRC=172.23.0.2 DST=<ext ip> LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=63742 SEQ=515334190 ACK=1161940855 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0228C591A136C09A01030307)

10.112.0.103.80 > <external ip>.64710: Flags [S.], cksum 0x839a (incorrect -> 0x3647), seq 3129672596, ack 2031230462, win 28960, options [mss 1460,sackOK,TS val 36559662 ecr 2706000943,nop,wscale 7], length 0

This is not my actual problem, but it will at least help me understand it. Thanks in advance!

  • It sounds like you have NAT in your setup somewhere. Try to get rid of it if possible. – Michael Hampton Oct 22 '20 at 23:59
  • Indeed there is - Docker does NAT there, and I wouldn't want to remove that. My question is more towards why there's a disparity in the source address between what TRACE shows me and tcpdump. – Jumail Mundekkat Oct 23 '20 at 02:58
  • One is before NAT and one is after NAT. – Michael Hampton Oct 23 '20 at 03:23
  • "_Docker does NAT there, and I wouldn't want to remove that._" Why not? It looks like you are using private addresses, so there is no reason to NAT from private to private unless the addressing overlaps, and it does not appear to overlap. You should just route. NAT is not a substitute for routing. – Ron Maupin Oct 23 '20 at 21:53
  • TRACE logging showed that even after all the NAT rules, the source IP was still the one of the Docker bridge interface so unless the TRACE logs don't actually show when this happens, I don't think this is the case. – Jumail Mundekkat Oct 27 '20 at 06:06
  • @RonMaupin You may be right, but the solution would require me to disable Docker's network configuration and do it all by hand, which I'd really rather avoid needing to do in this case (we need a temporary solution that works effectively). – Jumail Mundekkat Oct 27 '20 at 06:08

1 Answers1

2

There are many places where a network packet (e.g. TCP/IP packet) can get inspected on a Linux system. When you mention TRACE, I'll assume you mean the TRACE from iptables. tcpdump and iptables look at packets at different times during the packet's flow through the system. So as Michael Hampton commented, "One is before NAT and one is after NAT".

There are a number of useful diagrams depicting packet flow through a Linux system (search "linux network packet flow" on Google). To go a bit more detailed in the answer, have a look at the diagram in the following StackExchange Unix & Linux question:

https://unix.stackexchange.com/questions/281108/understanding-bridge-check-hop-in-packet-flow-in-linux-kernel

Also available in SVG here: https://en.wikipedia.org/wiki/Netfilter#/media/File:Netfilter-packet-flow.svg

In that diagram, I believe that tcpdump (via libpcap) inspects the packet at the step labeled "taps (e.g.AF_PACKET)". Then depending upon where you inserted your TRACE, you might see a different source address. Where did you insert your TRACE (e.g. in the base host or in the docker container)? I should ask that in a comment but I still don't have enough reputation to add a comment to a question in Server Fault.

The Stack Exchange Superuser site also has a similar question with a nice answer: https://superuser.com/questions/925286/does-tcpdump-bypass-iptables

DericS
  • 173
  • 6
  • Thank you for the insight, that diagram has come in handy many times. I inserted my TRACE in the base host, in PREROUTING on the "raw" table. I would have presumed that the pre-NAT and post-NAT would show up in TRACE (which yes, is TRACE from iptables) would have be shown in the log records afterwards, but this was not the case. – Jumail Mundekkat Oct 27 '20 at 06:22
  • I'm making some assumptions where I should probably ask for clarifications. from your TRACE and from tcpdump are the same IP? 172.23.0.2 is the IP of your docker container? You have a web server (or something responding to TCP port 80) in your docker container? Your tcpdump is on the physical interface of your local network (it's not on the interface going to your docker container)? The TRACE packet and the tcpdump packet are from the same packet transiting through the base host? – DericS Oct 29 '20 at 16:37