0

We have a web application running for 2 years without any problems. Suddenly a week ago the response times were becoming very bad. About 10-50 times slower than normal.

At a time there are maybe 10-20 users using the system. 90% of the user requests results in a database request. The systems responds normal early morning and in the evening when not many users are online.

  1. How can we detect the problem. Step-by-step documentation to resolve the problem?
  2. Are there specialised companies or specialist who could help us solving the problem?

Environment

Windows Server 2003
Quadcore Intel Xeon X3220, 2.4GHZ, 2 GB Ram
Sybase Anywhere 9 Database - Driver: jconn3.jar
Glassfish 2.1
Internet band width of server: 100MB/s

Applications

Webapplication with SmartGWT-Frontend (SmartGwt 2.4)
WebService accessed by external company

No EJBs, only WebContainer

First of all, it doesnt seem that the hardware is at the limit.
Java.exe is sometimes at 25% of CPU usage when heavy request are done, using 374 MB Ram
sybase-db server: 220MB ram
available memory: always around 1GB

Snapshot of requests

I made a snapshot of all request during 8 Minutes

210 seconds client requests (gwtservice) 45%
Total 967 requests, 212 milliseconds per request

100 seconds webservice (BankOrderService) 20%
Total 86 requests, 1170 milliseconds per request

160 seconds loading frontend elements into browser (.js, .png, jpg, .css etc.) 35%
Total 623 requests, 250 milliseconds per request

Example of most time consuming requests (in milliseconds):

15427.302 25.07.2012 11:50 Erfolg user1 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/091FF14E7C1D1187C770833D67B13321.cache.html
13558.571 25.07.2012 11:50 Erfolg user1 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Core.js
12631.877 25.07.2012 11:50 Erfolg user1 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Grids.js
11238.439 25.07.2012 11:50 Erfolg user1 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Forms.js
10535.141 25.07.2012 11:50 Erfolg user1 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/sc/modules/ISC_DataBinding.js
10003.115 25.07.2012 11:55 Erfolg anonymous REMOTE_WEB xx.yy.zz.25 URI:/BankWebService/BankOrderService
9999.412 25.07.2012 11:49 Erfolg anonymous REMOTE_WEB xx.yy.zz.25 URI:/BankWebService/BankOrderService
9999.229 25.07.2012 11:55 Erfolg anonymous REMOTE_WEB xx.yy.zz.25 URI:/BankWebService/BankOrderService
9992.415 25.07.2012 11:49 Erfolg anonymous REMOTE_WEB xx.yy.zz.25 URI:/BankWebService/BankOrderService
9990.473 25.07.2012 11:55 Erfolg anonymous REMOTE_WEB xx.yy.zz.25 URI:/BankWebService/BankOrderService
9132.848 25.07.2012 11:55 Erfolg user1 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/gwtservice
5933.174 25.07.2012 11:50 Erfolg user2 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Grids.js
5864.426 25.07.2012 11:50 Erfolg user2 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Core.js
5571.739 25.07.2012 11:50 Erfolg user2 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/sc/modules/ISC_DataBinding.js
5473.637 25.07.2012 11:50 Erfolg user2 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Forms.js
5158.104 25.07.2012 11:50 Erfolg user3 REMOTE_WEB xx.yy.zz.237 URI:/BankApp/org.Bank.Main/gwtservice
4488.047 25.07.2012 11:50 Erfolg user2 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/images/chf.jpg
4442.574 25.07.2012 11:56 Erfolg user2 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Core.js
4072.268 25.07.2012 11:54 Erfolg anonymous REMOTE_WEB xx.yy.zz.25 URI:/BankWebService/BankOrderService
3939.546 25.07.2012 11:56 Erfolg user2 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Grids.js
3876.443 25.07.2012 11:50 Erfolg user1 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/sc/modules/ISC_Foundation.js
3727.795 25.07.2012 11:50 Erfolg user4 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/gwtservice
3630.225 25.07.2012 11:48 Erfolg user4 REMOTE_WEB xx.yy.zz.162 URI:/BankApp/org.Bank.Main/091FF14E7C1D1187C770833D67B13321.cache.html
3552.007 25.07.2012 11:50 Erfolg user5 REMOTE_WEB xx.yy.zz.228 URI:/BankApp/org.Bank.Main/gwtservice

Sessions

18 active Sessions
After a client login (provided by glassfish, https), Once the user is authenticated by glassfish, there there is a second login in the application itself where the user has to define into which branch he wants to login. After the second login, 3 attributes (username, branch, ip-address) are stored in the session.

There are always about 40%-50% of sessions without these 3 attributes, I interpret it like that, that the first login was made but the second not.

examples: session id:e6df980ab67cf0456d78761eefa1
8 sessions without the 3 attributes

session id:d72d16bdabb5500e73f721475440:{username=user1, branch=000x, ipadr=xx.xx.xx.xx}
10 sessions with the 3 attributs

I thought maybe these 8 sessions are from a hacker? I ran wireshark to find out if there are some suspicious ip-Addresses, however I havent found a lot. One day there was an ip from Sweden and we have nothing in Sweden. However this wasnt a lot of traffic, just a few lines in the wireshark capture log during a few seconds.

at 7/17/2012 The msn account of one of the users has been hacked.

Around that date the problems started as well. maybe with a delay of 1-2 days.
Coincidence?

Any help is highly appreciated.

Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
Andy
  • 1
  • 1
  • Has anything changed in the Environment? Real-time scanner (virus, malware, etc.) scanning every file served? Windows patches? Sybase patches? Are you being DDoS'ed? Also, you might want to ask over on [su] or [webmasters.se]. – Cᴏʀʏ Jul 25 '12 at 15:58
  • Thanks a lot. As far as I know nothing changed. – Andy Jul 25 '12 at 16:22

1 Answers1

0

This is what I would do in order:

Restart everything including the OS. There was the leap second problem a few weeks ago. I had to restart our glassfish servers because of high CPU usage. The 25% is probably maxing out a CPU core. Mine were linux based, and I don't know if it also happened on Windows.

Can you update your glassfish server? It's quite old and there's a number of known denial of service exploits. You didn't mention your Java version, but it could probably use an update too.

Turn on access logging on the glassfish server so you can see if you're getting pounded with some scripted requests. You can do that on the admin console.

Finally look at the database. Sometimes as tables change the query optimizer starts making poor choices. I'm not familiar with sybase to be more specific. In Oracle you need to make sure you're collecting optimizer statistics as tables grow.

JOTN
  • 6,120
  • 2
  • 26
  • 31