0

Here is a little background of a intermittent but very real problem. I maintain a web application that is running on Esxi private cloud. We have a database server, and 4 web application servers. The 4 web application servers all have a very odd problem. The servers all run (Ubuntu 10.04.3, 2.6.32-28-server, Apache web server, proftp ftp) When I transfer many images, either over ftp or http, (internal or external), most of the time it is very slow. For example, downloading a directory of about 400 images (< 2 megabytes) have very slow transfer speeds, sometimes at a complete stand still. Similarly over http if I load one of my pages in my browser for the first time, depending on how many images there are, it may have the same struggle. When I load the same page again, it is ok because of browser caching, however as soon as i clear the cache, it is back to the very sluggish speeds. I say may because this doesn't happen all the time, sometime it transfers the files at normal speed which is virtually instantly. All of these machines are virtual, and I have tried just building them again, with no success. Someone suggested it could be a host name, or a naming problem, but didn't really give a clue of where to being to trouble shoot this theory. I'm not a server guy, nor do I pretend to be one. Our servers are hosted by a 3rd party with managed hosting, however the managed hosting doesn't seem to know what to do with this problem. Any help would be greatly appreciated

rizzo0917
  • 84
  • 1
  • 4

2 Answers2

1

Your goal here is to locate the bottleneck. In other words, the particular place in the pipeline that is causing the slow downs.

Collect your Vitals

The first step to this is to collect data. The most common bottlenecks are Network IO, Disk IO, CPU, and Memory Constraints. You can either use a full fledged monitoring system that collects data like Cacti, or something local like sar which is part of the systat package.

Once you have this data, you should be able to cross reference it with response time from your web logs and see if any of this data correlates. It might not and you will have to dig deeper -- but this of these as the vitals that you always check first.

Less fancy, you can just watch something like vmstat output and you might get lucky and catch the problem -- but intermittent problems can be hard to solve.

Localizing the Problem is a fundamental troubleshooting skill

When solving a problem it often makes sense to use a sort of binary search along the pipeline to try to eliminate variables.

So for example if you are testing from a browser on workstation, run tests from the server itself to cut out your workstation and the network. If you no longer have that problem, it is either your desktop or the network between you and the server. If you still have the problem, it must live somewhere in the server.

Other Places to look

Also if this is a dynamic web page there could be problems with the application itself, so you might approach this from profiling production code as well.

Lastly, one of the difficulties with virtualized environments is that other virtual servers might be impacting yours -- but you probably don't want to jump to this conclusion until you are at least monitoring your vitals.

Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
0

Slowness, either permenant or intermittent, could just be due to I/O contention. If there are many active VMs on the same host as yours you will be competing with them for I/O bandwidth. It could also be that the host is overcommitted in terms of memory use so the memory pages allocated to your VM(s) are actively swapping, which has much the same effect but worse.

When transferring many small image files via HTTP, the problem may not be the server though. If you are some distance away from the machine it could simply be a latency issue. If there is 100ms between you and the server (you can check this with a basic ping command) and the browser uses 2 concurrent connections then you are going to see an averge of around 50ms per image added to the page load time just due to network latency.

Similarly for sending many files via FTP, though the FTP protocol is very chatty (transferring a file takes at least two round-trips on the command connection and the opening of a new data connection) so is more sensitive to latency related performance issues than HTTP. You can improve on file transfer performance by using a more modern protocol like SFTP or SCP (both provided by OpenSSH and other SSH servers) (not FTPS: this is just FTP via SSL streams, so has all the bottlenecks of normal FTP plus those of SSL).

David Spillett
  • 22,754
  • 45
  • 67