I've got an issue which caused me quite a bit of a headache and found the solution, but I cannot understand why the problem was happening in the first place.
I've got an nginx reverse proxy which is routing requests to 1 of 3 servers. All are node servers, two of them are express and one is a custom built one.
The issue: when my firewall is up, the first request in the total uptime of the two express servers suffers a 504:Gateway Timeout. After that happens, the files are served quickly and without issue, even over the course of multiple days.
I was able to narrow the problem down to the firewall by noticing that the issue disappears when I bring down the firewall completely. My server is running FreeBSD, using PF as a firewall. Here is the relevant pf.conf:
iface = "vtnet0"
loopback = "lo0"
public_ip = "132.148.77.28/32"
localnet = "127.0.0.1"
nat on $iface from any to any -> $public_ip # added by cloud server provider
pass out quick on $iface proto { tcp udp } from port ntp keep state
block log all
pass in on $iface proto tcp from $localnet to port mail
pass in log on $iface proto tcp to $public_ip port { ssh http } keep state
pass in log on $loopback proto tcp to $localnet port { 5555 5556 5557 } keep state
pass out log all keep state
I understand that any traffic on the vtnet0
device will be translated to using the public-ip.
I ran a tcpdump with the following command:
sudo tcpdump -c 10 -vvv -t -i pflog0 -e -n tcp
I captured 10 packets, with triple verbosity, suppressing the timestamp, on pflog0, printing the pf rule which allowed or blocked the packet, leaving ip addresses and ports as numbers, and only capturing traffic using tcp.
Here are the interesting results of the first request, along with the results of subsequent requests:
$ sudo tcpdump -c 10 -vvv -t -i pflog0 -e -n tcp
tcpdump: WARNING: pflog0: no IPv4 address assigned
tcpdump: listening on pflog0, link-type PFLOG (OpenBSD pflog file), capture size 65535 bytes
rule 8..16777216/0(match): pass in on vtnet0: (tos 0x0, ttl 52, id 53488, offset 0, flags [DF], proto TCP (6), length 60)
67.197.156.119.45274 > 132.148.77.28.80: Flags [S], cksum 0x76df (correct), seq 356096480, win 29200, options [mss 1460,sackOK,TS val 63916304 ecr 0,nop,wscale 7], length 0
rule 9..16777216/0(match): pass out on lo0: (tos 0x0, ttl 64, id 2406, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.28850 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0x180a), seq 2069090201, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33367168 ecr 0], length 0
rule 5..16777216/0(match): pass in on lo0: (tos 0x0, ttl 64, id 2406, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->3354)!)
127.0.0.1.28850 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0x180a), seq 2069090201, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33367168 ecr 0], length 0
rule 9..16777216/0(match): pass out on lo0: (tos 0x0, ttl 64, id 2414, offset 0, flags [DF], proto TCP (6), length 60)
132.148.77.28.48488 > 132.148.77.28.80: Flags [S], cksum 0xa38f (incorrect -> 0x1728), seq 3973611725, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33367544 ecr 0], length 0
rule 2..16777216/0(match): **block in on lo0**: (tos 0x0, ttl 64, id 2414, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->8ded)!)
132.148.77.28.48488 > 132.148.77.28.80: Flags [S], cksum 0xa38f (incorrect -> 0x1728), seq 3973611725, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33367544 ecr 0], length 0
rule 2..16777216/0(match): **block in on lo0**: (tos 0x0, ttl 64, id 2416, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->8deb)!)
132.148.77.28.48488 > 132.148.77.28.80: Flags [S], cksum 0xa38f (incorrect -> 0x0b6a), seq 3973611725, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33370550 ecr 0], length 0
rule 2..16777216/0(match): **block in on lo0**: (tos 0x0, ttl 64, id 2418, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->8de9)!)
132.148.77.28.48488 > 132.148.77.28.80: Flags [S], cksum 0xa38f (incorrect -> 0xfee6), seq 3973611725, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33373753 ecr 0], length 0
rule 2..16777216/0(match): **block in on lo0**: (tos 0x0, ttl 64, id 2420, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->8de7)!)
132.148.77.28.48488 > 132.148.77.28.80: Flags [S], cksum 0xa38f (incorrect -> 0xf265), seq 3973611725, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33376954 ecr 0], length 0
rule 2..16777216/0(match): **block in on lo0**: (tos 0x0, ttl 64, id 2422, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->8de5)!)
132.148.77.28.48488 > 132.148.77.28.80: Flags [S], cksum 0xa38f (incorrect -> 0xe5c8), seq 3973611725, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33380183 ecr 0], length 0
rule 2..16777216/0(match): **block in on lo0**: (tos 0x0, ttl 64, id 2424, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->8de3)!)
132.148.77.28.48488 > 132.148.77.28.80: Flags [S], cksum 0xa38f (incorrect -> 0xd943), seq 3973611725, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 33383388 ecr 0], length 0
10 packets captured
11 packets received by filter
0 packets dropped by kernel
$ sudo tcpdump -c 10 -vvv -t -i pflog0 -e -n tcp
tcpdump: WARNING: pflog0: no IPv4 address assigned
tcpdump: listening on pflog0, link-type PFLOG (OpenBSD pflog file), capture size 65535 bytes
rule 8..16777216/0(match): pass in on vtnet0: (tos 0x0, ttl 52, id 408, offset 0, flags [DF], proto TCP (6), length 60)
67.197.156.119.45482 > 132.148.77.28.80: Flags [S], cksum 0x2e49 (correct), seq 3314040182, win 29200, options [mss 1460,sackOK,TS val 64109806 ecr 0,nop,wscale 7], length 0
rule 9..16777216/0(match): pass out on lo0: (tos 0x0, ttl 64, id 2446, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.19450 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0x603e), seq 629617255, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 34141175 ecr 0], length 0
rule 5..16777216/0(match): pass in on lo0: (tos 0x0, ttl 64, id 2446, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->332c)!)
127.0.0.1.19450 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0x603e), seq 629617255, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 34141175 ecr 0], length 0
rule 9..16777216/0(match): pass out on lo0: (tos 0x0, ttl 64, id 2459, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.54320 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0x8ec2), seq 2137792783, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 34141359 ecr 0], length 0
rule 5..16777216/0(match): pass in on lo0: (tos 0x0, ttl 64, id 2459, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->331f)!)
127.0.0.1.54320 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0x8ec2), seq 2137792783, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 34141359 ecr 0], length 0
rule 8..16777216/0(match): pass in on vtnet0: (tos 0x0, ttl 52, id 13349, offset 0, flags [DF], proto TCP (6), length 60)
67.197.156.119.45484 > 132.148.77.28.80: Flags [S], cksum 0x9e57 (correct), seq 2310473985, win 29200, options [mss 1460,sackOK,TS val 64109860 ecr 0,nop,wscale 7], length 0
rule 8..16777216/0(match): pass in on vtnet0: (tos 0x0, ttl 52, id 32739, offset 0, flags [DF], proto TCP (6), length 60)
67.197.156.119.45486 > 132.148.77.28.80: Flags [S], cksum 0x5b67 (correct), seq 2464105159, win 29200, options [mss 1460,sackOK,TS val 64109860 ecr 0,nop,wscale 7], length 0
rule 8..16777216/0(match): pass in on vtnet0: (tos 0x0, ttl 52, id 55497, offset 0, flags [DF], proto TCP (6), length 60)
67.197.156.119.45488 > 132.148.77.28.80: Flags [S], cksum 0x832f (correct), seq 3354322413, win 29200, options [mss 1460,sackOK,TS val 64109860 ecr 0,nop,wscale 7], length 0
rule 9..16777216/0(match): pass out on lo0: (tos 0x0, ttl 64, id 2474, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.60189 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0xe2c0), seq 1589892785, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 34141386 ecr 0], length 0
rule 5..16777216/0(match): pass in on lo0: (tos 0x0, ttl 64, id 2474, offset 0, flags [DF], proto TCP (6), length 60, bad cksum 0 (->3310)!)
127.0.0.1.60189 > 127.0.0.1.5556: Flags [S], cksum 0xfe30 (incorrect -> 0xe2c0), seq 1589892785, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 34141386 ecr 0], length 0
10 packets captured
14 packets received by filter
0 packets dropped by kernel
I added newlines in between the packet info for better readability. You'll notice in the first dump where "rule 2" is blocking packets, which happens only in the very first request after the upstream server is started afresh.
What gives? You'll notice too that those blocked packets are on the loopback interface, but they coming from and going to the public ip address. I think they shouldn't be NAT'ed because they're not on the vtnet0 interface. Even still, why are subsequent requests taking a different path to facilitate communication from nginx to the upstream server?
The fix I found was to change the following line from my pf.conf:
pass in log on $iface proto tcp to $public_ip port { ssh http } keep state
to:
pass in log proto tcp to $public_ip port { ssh http } keep state
I removed the on $iface
part so the firewall will let pass even packets to $public_ip port 80 on the loopback interface. I also have mild concerns about the security implications of loosening that rule - which are mild only because I don't fully understand any longer how the traffic is operating between these different network interfaces.
At this point I'm not even sure if the issue is with my firewall or with nginx, just that when the firewall is disabled, the issue vanishes.
Note that nginx is proxy_pass
'ing the requests in three different server
blocks to http://127.0.0.1:[port]
where port is one of 5555, 5556, or 5557.