2

We host a SAAS application specially customized for multiple clients. For one customer in particular -- they are reporting sporadic performance issues from various locations on their network, in particular UPLOADING documents through a form on our website.

The client claims they have "bandwidth to spare" and that utilization of their "pipe" is so low that it MUST be our application, but our application has MANY clients and all features are working fine for all other clients.

Interestingly enough --

  • DOWNLOADS (ie. just accessing the website, or downloading documents) is working fine.
  • Speed test shows that they should get 1.2Mbps UP. So, a 3MB file should take 20 secs to upload. It takes 60+ seconds on their network. Sometimes even small files take OVER 10 minutes to upload or they timeout.
  • Pings and Traceroutes don't show any abnormally long hops or response times.
  • They claim other SAAS applications they use allow them to upload just fine.

Both IT teams are working together to resolve this issue. What kind of data can I request from the clients to begin ruling things out.

Seems like we need to somehow measure LATENCY of the networks involved or even at the switch level, we need to understand if packets are getting dropped somewhere and why.

Where should I start? Any help is appreciated. I'll provide more info upon requests

tresstylez
  • 378
  • 1
  • 4
  • 17
  • 1
    If you're running the web site on IIS you can configure IIS to log time-taken and then look at that after one of these problem uploads. That should give you an idea of whether or not there's a real problem. - http://support.microsoft.com/kb/944884 – joeqwerty Oct 18 '12 at 01:41
  • Barring my first suggestion, you should use something like iperf or Qcheck to test for latency with real traffic, not ICMP packets. – joeqwerty Oct 18 '12 at 01:44

1 Answers1

0

Intermittent performance problems can be a true horror to track down, as you have probably noticed by now. When I attack this kind of problems I typically try to isolate the various systems involved, and examine each one in turn:

  1. Ask the client to use a "clean" system. No AV/AM/watch-what-our-employees-are-doing-app-2012. Ask the client to record when they notice unusual behavior. (This covers most configuration errors on the client side)
  2. Do a long term (24h+), continuous measurement on the link to the client and from the client (if possible). ICMP is a decent but not perfect choice. There are lots of specialized tools that can be used. Make sure you measure with large frames, as path MTU issues are not unusual when it comes to bad performance. (This covers most network errors)
  3. Scrutinize your OS (and Hypervisor, if virtualized) logs. Be especially careful with performance related statistics, like memory usage, lost network frames and CPU usage. If you are not logging these yet, start logging right now. Having a baseline to compare against makes life a lot easier. (This catches many hardware and software problems)
  4. As joeqwerty suggests, ask your webserver and database to log any long running requests. (Helps you track down if the problem is actually in your application)
pehrs
  • 8,789
  • 1
  • 30
  • 46