We have a small network (maybe 15 users at top business hours, around 30 devices) managing internet connection and phone lines.
The primary use of the network is for users to access the web based externally hosted database interface with which they manage the business (clients, sales, invoices, etc etc). this web based database interface is a php/mysql application, developped over the last year and a half. As the primary (if not only) programmer, I have access to everything about this application and our network from our hosting solution to the source code to the choice of technologies.
some months ago, we noticed a really bad slowing down on something like 5% of the clics made using the database interface. What I mean is, most of the clicks will deliver the content in one to two seconds, then, sometimes, one of the clicks will take up to one minute to load.
What dazzles me is
- it does not seem to be realted to the content of the page. Loading the same page over and over again will work 95% of the time, with one click sometimes taking from 1000% to 2000% more time to load. loading different pages over and over again will have the same results.
- the web based interface faces no slowing down when accessed from outside the office. that is, it slows down only when used behind our local network.
- other web pages seem to not be slowed down, but stopping the lagged down page and reloading it will work ok, which makes me thinks it happens for a few seconds, blocking all clicks made during this threshold, but not those made two seconds after
- the website uses jquery and jquery-ui, as well as some other libraries (jquery cookie master, xdate). The slowness happens if they are loaded from our servers or directly from jquery and ajax.googleapis
faced with all this, I consulted networks professionals, and finally we changed our network equipment, we now use a cisco asa-5505 firewall and a managed cisco catalyst switch.
- before, we noticed that pings to google.com would sometimes timeout or take up to 13000 ms to load, whereas normally we have a ping at 20-30ms
- we also noticed by consulting our isp that we had up to 800gb of uploads a month! we do manage photos and heavy files, but 800gb if far from a normal use. For a time, we thought high network upload traffic could prevent page from loading ressources. We could see with our isp's history that the slowness and the massive uploads seem to happen roughly at the same time. I can't tell without any doubt which happened first, they seem to happen both in the same week.
- we obviously had numerous packets loss errors
- I can't tell for sure which protocols the slowness affects or not. I did not experience directly a download failling, and have not had any complaints from users that their download failed, but uploading files to ftp through netbeans will be affected. However, users use
transmit
to transfer files to our clients and back, and I haven't had complaints about files not being uploaded correctly or clients receiving corrupted files.
sadly, just because I haven't had complaints does not mean it have not happened, as communication with users is somewhat tensed those times. I'd say the slow network is in cause [pun intended].
when the experts we hired came to install the new pieces on the network, they did some configuration and monitoring and now, our ping is back to normal, and the packets loss errors happens far less often, to a point where we think those are only normal packet lost over the internet. Also from what we can see (it's only been a few days) the gigantic uploads have stopped.
but, 5% of the clicks still take lots of time to load.
I tried debugging with the net tab of firebug to see which part of the website loaded slow. The server itself responds between 200-800 ms, depending on the complexity of the page, which seems ok. Most of the images load ok, the libraries too, but when the slowness happens, one or more of those images or libraries will wait forever before loading. it is not always the same library or image
My thinking is that somehow, when loading images and libraries to display the page, the network load will prevent ressources to reach their destination correctly.
how can I pin down what is preventing a specific ressource, be it an image or a js library, to load?
I lack the technical skills to use wireshark or other advanced (from my humble point of view) networking tools, but will learn it if I must. That being said, monitoring at this point seems irrelevant, I don't want to see it being slow, I know it is slow. I want to know what prevents ressources to reach their destination computers in our local network.