0

I host a server through the Google Compute Engine for various purposes, most notably a Minecraft server, and it receives a lot of traffic and usage. We're talking fairly constant CPU usage at 150%, 50 read/10 write per second to disk, and 600 down/1700 up network packets per second, so, quite a bit of usage.

The issue that I'm having is that, despite the server being entirely capable of handling this much usage, there are still problems with latency and/or processing. There are many points where, out of the blue, one process, which should only take a tenth of a second to perform, takes 40 seconds or even longer.

Here are the possible problems that we have already considered and solutions we have already put in place:

  • CPU Usage
    • We use 100-250% CPU at all times, with 4 vCPUs (so we should not have any form of performance limiting).
  • Disk IO
    • We max out at 80 reads and 30 writes per second, but use SSDs with an sustained random IOPS limit of 1.5k, so this shouldn't be a problem either.
    • We max out at 1.6 MB read and .75 MB write per second, but use SSDs with a sustained throughput limit of 24 MB/s.
  • Network
    • We max out at 600 packets received (avg. 400) and 1700 packets sent (avg. 800) per second. I'm not sure how we can improve this in any way, but I fail to see how this would be an issue when the network provider is Google.
    • We max out at 28 KB received and 280 KB sent per second. Our network speedtest shows us capable of handling thousands of times this value.
    • It is least likely to relate to this, as most issues encountered involve server-side issues.
    • We cannot do any form of load balancing by splitting up the user connections as they must all connect back to one server which hosts the Minecraft world.
  • RAM
    • We have 5 GB of RAM dedicated to the processes that are having difficulties. This makes me doubt its involvement, as we rarely use more than half of that.
  • Java
    • Our Minecraft server jar, being Minecraft, is written in Java. We are using the following Java version:
      java version "1.7.0_111"
      OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-1~deb8u1)
      OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)
    • We use the following parameters for the jar execution:
      -server -Xmx5G -Xms5G -Xmn2500M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalPacing -XX:ParallelGCThreads=4 -XX:+AggressiveOpts
    • We have considered the possibility that Java (and specifically Minecraft) simply have poor memory management, but do not know how to fix that, if we can at all.

As you can see, we have taken various measures to reduce our latency and process limiting, but we've simply run out of ideas. Is there some other way in which our process is being limited that we've missed, or is this a problem inherent with the software that we host?

1 Answers1

1

A couple of things to have in consideration:

a) GCE VMs have network egress throughtput caps as explained here. PD write I/O and network traffic counts against this cap. For a VM that have 4 cores the cap is 8 Gbps.

b) The Maximum sustained IOPS of GCE disks are documented in this article. Using Local SSD might increase your performance, but information on those disks is not durable. In other words data in Local SSD persists only until you stop or delete the VM.

c) Stackdriver can help you monitor the resources in your project and indenfiy bottlenecks.

Carlos
  • 1,395
  • 9
  • 15