0

We recently had an EC2 machine with a Wireguard VPN stop working in the middle of the night in the last two days. After troubleshooting it, the problem seemed that incoming UDP packets were not reaching the machine. I edited the security group to allow incoming UDP packets and everything started working again.

Initially, I thought someone had changed the security group, but cloudtrail shows my edit as the only action in the last 90 days. That must mean there is either a bug, or SG behavior changed in the last two days to not allow related packets back in on a connection.

The answer here by Michael is what I would assume to be truth: AWS Security Group show 'UDP Port open' while it should deny?

Am I mistaken? Is there something else I should consider?

EDIT 1: Some clarifications

The EC2 machine is connecting via wireguard to a machine in a data center. Both ends of the connection run wireguard in a docker container.

My troubleshooting steps were roughly the following (please ignore exact syntax since netcat varies across linux distributions)

On the EC2 machine I ran the following:

echo help | nc -u -l 5000

On the data center machine I ran the following:

nc -u ec2.example.com 5000

I found this didn't work until I added the UDP rule to the SG. Once I did, the tcpdump I was running in another window exploded with traffic.

lopass
  • 158
  • 10
  • 2
    A change of this magnitude seems .. unlikely. AWS is rather risk adverse to such a massive change without a clear and lengthy migration strategy. I'd be more inclined to think a security group was added or removed from the instance that changed the overall policy for UDP traffic. – Anon Coward Mar 31 '23 at 18:29
  • I agree, I would have expected to see other people complaining. I am leaning towards bug that might disappear if I delete the EC2 instance and recreate it – lopass Apr 01 '23 at 20:16

1 Answers1

0

A more likely explanation is that Wireguard changed their logic whereby the response comes from a different server than the one which received the original request. You can enable VPC Flow Logs to see whether this is what is going on.

Alex Chadyuk
  • 1,421
  • 2
  • 10
  • it is a single server in a data center under out control. Let me edit my question to include a few things I did to troubleshoot – lopass Apr 01 '23 at 20:07