1

Short summary of my issue: How or why can a stable server suddenly start to perform worse for several days, only to later (seemingly automatically) return to normal?


I'm doing some performance testing over a VPN server solution. The setup is basically the following:

Client (Linux) connects to VPN server which in turn routes traffic to/from a node on the "internal" network (also Linux).

The test in short: On the internal node (linux) I execute the command

iperf -s -p 111 -u

Then, on the client, I execute

iperf -t20 -c<internal ip> -p 111 -l1000 -b100M -u

These tests usually produce around 40Mbit/s throughput with ~0% packet loss on a consistent and regular basis. I.e. using the same hardware setup (with minor software changes for client/server) these tests have passed for over 100 days straight. However, for the last week the throughput have decreased about 10% per day and the packet loss has increased about 10% per day.

I ran different variants of the iperf tests (varied parameters etc.) after doing the following:

  • Restart client, server and internal node
  • Replaced client and server software with previous, stable, builds
  • Replaced cables and switches between client/server and server/internal node
  • Tried with different clients and different servers (hardware)

None of it had any impact.

However, all of a sudden, it just started working. A run had 20Mbit/s with 60 % packet loss and all runs thereafter were "back to normal". Note that this was hours after replacing the hardware/software and running the above test approximately 100 times.

I have since restored the hardware and software to their original setup, repeated the tests about 100 times more, and the numbers look (consistently) good. I.e. my "problem" is "solved".

I am however extremely perplexed about what caused this issue. None of the actions I took to remedy the problem had an effect (since it took hours after changing everything for it to be stable, and the current setup is identical to that of what it was when everything started to fail).

I am a beginner when it comes to network administration/engineering so I don't have a clue regarding: - What caused the problem to appear in the first place? - How did the problem solve itself? - How should I have approached this problem?

I am asking these questions here since I'm frustrated; I have not learned anything new and I don't know what to do the next time this problem appears. Perhaps my questions are to broad, but any helpful tips or resources describing similar problem and solutions are helpful!

ledwinder96
  • 111
  • 3

0 Answers0