1

We have a small network (maybe 15 users at top business hours, around 30 devices) managing internet connection and phone lines.

The primary use of the network is for users to access the web based externally hosted database interface with which they manage the business (clients, sales, invoices, etc etc). this web based database interface is a php/mysql application, developped over the last year and a half. As the primary (if not only) programmer, I have access to everything about this application and our network from our hosting solution to the source code to the choice of technologies.

some months ago, we noticed a really bad slowing down on something like 5% of the clics made using the database interface. What I mean is, most of the clicks will deliver the content in one to two seconds, then, sometimes, one of the clicks will take up to one minute to load.

What dazzles me is

  • it does not seem to be realted to the content of the page. Loading the same page over and over again will work 95% of the time, with one click sometimes taking from 1000% to 2000% more time to load. loading different pages over and over again will have the same results.
  • the web based interface faces no slowing down when accessed from outside the office. that is, it slows down only when used behind our local network.
  • other web pages seem to not be slowed down, but stopping the lagged down page and reloading it will work ok, which makes me thinks it happens for a few seconds, blocking all clicks made during this threshold, but not those made two seconds after
  • the website uses jquery and jquery-ui, as well as some other libraries (jquery cookie master, xdate). The slowness happens if they are loaded from our servers or directly from jquery and ajax.googleapis

faced with all this, I consulted networks professionals, and finally we changed our network equipment, we now use a cisco asa-5505 firewall and a managed cisco catalyst switch.

  • before, we noticed that pings to google.com would sometimes timeout or take up to 13000 ms to load, whereas normally we have a ping at 20-30ms
  • we also noticed by consulting our isp that we had up to 800gb of uploads a month! we do manage photos and heavy files, but 800gb if far from a normal use. For a time, we thought high network upload traffic could prevent page from loading ressources. We could see with our isp's history that the slowness and the massive uploads seem to happen roughly at the same time. I can't tell without any doubt which happened first, they seem to happen both in the same week.
  • we obviously had numerous packets loss errors
  • I can't tell for sure which protocols the slowness affects or not. I did not experience directly a download failling, and have not had any complaints from users that their download failed, but uploading files to ftp through netbeans will be affected. However, users use transmit to transfer files to our clients and back, and I haven't had complaints about files not being uploaded correctly or clients receiving corrupted files.

sadly, just because I haven't had complaints does not mean it have not happened, as communication with users is somewhat tensed those times. I'd say the slow network is in cause [pun intended].

when the experts we hired came to install the new pieces on the network, they did some configuration and monitoring and now, our ping is back to normal, and the packets loss errors happens far less often, to a point where we think those are only normal packet lost over the internet. Also from what we can see (it's only been a few days) the gigantic uploads have stopped.

but, 5% of the clicks still take lots of time to load.

I tried debugging with the net tab of firebug to see which part of the website loaded slow. The server itself responds between 200-800 ms, depending on the complexity of the page, which seems ok. Most of the images load ok, the libraries too, but when the slowness happens, one or more of those images or libraries will wait forever before loading. it is not always the same library or image

My thinking is that somehow, when loading images and libraries to display the page, the network load will prevent ressources to reach their destination correctly.

how can I pin down what is preventing a specific ressource, be it an image or a js library, to load?

I lack the technical skills to use wireshark or other advanced (from my humble point of view) networking tools, but will learn it if I must. That being said, monitoring at this point seems irrelevant, I don't want to see it being slow, I know it is slow. I want to know what prevents ressources to reach their destination computers in our local network.

  • Just so we're all clear, you state `" the web based database interface"` -- since your question is quite long, does this mean that it is a hosted solution external to your network? Or is it located internally? It sounds like it is external from your "troubleshooting" but clarification will help. If it's external do you have full access to this system or is it a 3rd party solution? – TheCleaner Nov 24 '14 at 18:37
  • @TheCleaner this `web based database interface` is a full php mysql application I programmed for my clients. it is hosted externally. – Félix Gagnon-Grenier Nov 24 '14 at 18:46

2 Answers2

1

You will probably need to use nmap ( or wireshark for example ) to inspect the local network, it can help you find a windows computer having a virus and sending thousands of spam arp requests, or a user using a bittorrent client or whatever can saturate your local network or your internet upload

The other option is that most ISPs are far from perfect . . . perhaps sometimes the ISP have some packet loss or upload stability problems. Installing a monitoring tool like smokeping, and monitoring a target on the internet could help you see this ( packet loss , slow upload, slow ping ), and also allow you to see when that happens ( everytime john doe is at the office and plugs his computer to the network ? ).

neofutur
  • 667
  • 9
  • 18
  • some packet loss and upload stability problems could really amount to 800gb of uploads a month? – Félix Gagnon-Grenier Nov 24 '14 at 18:53
  • 1
    i also say it could be a virus or a bittorrent client . . . one sure thing is that, when your upload is saturated, everything gets slow, even download, and the the ping can jump easily to 2000+ ms ; nmap will help you find the computer and the protocol, port used to saturate upload – neofutur Nov 24 '14 at 18:55
  • I had this case once while consulting for a company, a short nmap session allowed me to find a computer that had a virus and was saturating the network, as soon as I told them "wtf is this 192.168.1 X ! shutdown this s**tty omputer", the network problems were gone. – neofutur Nov 24 '14 at 19:02
  • and depending on what you are doing, and the updoad ratio of your internet line, 800 gb monthly can be normal or huge, 15-30 users can genreate quite a bit of upload . . .I cant know. what are the specs of you internet ( download rate, upload rate ) – neofutur Nov 24 '14 at 19:08
  • to put it in context: the primary use of the network is to consult our web based interface. The data used monthly as measured by our host (that is, the amount of data which is actually used to access the interface, that being the biggest part of our actuvity) amounts to 30gb! that means 770 gb are uploaded we don't know where to we don't know who: all the files transfered to our clients pass by our ftp (so they amount in the 30 bg I mentioned) – Félix Gagnon-Grenier Nov 24 '14 at 19:12
  • 1
    use nmap ;) find the additional 770 gb .but only 30 gb monthly upload for 15-30 users . . . i can hardly believe this number. – neofutur Nov 24 '14 at 19:15
  • also dont believe everything is secure as a default, theres probably a hole somewhere in your firewall or router that allows 770 more GBs to be uploaded. – neofutur Nov 24 '14 at 19:18
  • As mentioned at the end of my (long :p) question, we now have correctly configured professional grade pieces of equipment managing the network, and the upload seem to have stopped, but somehow there's is still some slow times – Félix Gagnon-Grenier Nov 24 '14 at 19:20
  • eh even " correctly configured professional grade" stuff have holes and backdoors nowadays – neofutur Nov 24 '14 at 19:24
1

So, without actually getting on your network(s) and diagnosing in detail with you, an answer here will likely be more of a "point you in a direction" and see if it works.

That said, I take a sort of Occam's razor approach when dealing with this kind of thing.

You stated:

"the web based interface faces no slowing down when accessed from outside the office. that is, it slows down only when used behind our local network."

This here, if truly accurate, is the piece to key in on.

The issue SHOULD lie then somewhere within that "local network". It's NOT on the externally hosted server/application, otherwise the same issues could be replicated from another location.

So where? You've already swapped out a few pieces of network gear to no avail it would seem.

Here's what I would suggest you do, which may sound simple, but work from the farthest external point of your network backwards:

  1. Take a laptop that has NEVER been on the local LAN, isn't part of a domain, etc. and attach it directly to the local network's ISP. Directly is the key. Put a laptop directly on the ISP's network without a firewall or similar in the mix and see what kind of responses/performance you then get and compare it to your known good external setup that works fine.
  2. If you don't see issues in step #1, proceed to move the laptop one layer back, directly attached to the ASA firewall, and then test again through the firewall.
  3. If no issues at #2, then go back another layer, this time behind the managed switch.
  4. If no issues at #3, then you know it isn't a network issue.

And so on and so forth (trying a client that has "issues" externally, etc.) until you at least can objectively say what/where the issue occurs, even if you don't know why or specifically what is causing it at a particular "layer". Then you start a deeper dive into that specific "layer" if you will.

TheCleaner
  • 32,627
  • 26
  • 132
  • 191