0

We have been asked to monitor the performance of a running batch job. The Batch Job is running on the Jeus Application Server and is running on a 48 core HP UX server. The batch job in question has about 1500 threads. The exceptions that has occurred the maximum is NumberFormatException. The Batch Job does not terminate though and it continues to run.

While monitoring using HPJmeter, I noticed that there are thousands of exceptions being thrown. NumberFormat is just one of the more frequent ones, but there are a lot more. I have the following questions:

  • Is this indicative of bad design/coding?
  • Does the application server usually handle a lot of exceptions and not report them?
  • Does this affect the performance of the running applications? ( There were around 11000 exceptions thrown in around 45 minutes of running )

Thanks, Aditya.

Aditya
  • 47
  • 1
  • 7

2 Answers2

2
  1. Yes, especially since the developers have not gone through and at least wrapped these up in custom exceptions. Otherwise, they should be outputted to a log file as warnings. There's a reason logging libraries exist.
  2. If the exceptions are real, then it could either be due to the code being broken or the dataset changing. I'd recommend tracing at least one job to understand why the errors are occurring. Having worked with petabytes of data in a job before, I understand how frustrating that can be, but you'll have hell to pay later if the output of this job is then consumed later and causes you problems.
  3. If the compute path that is throwing the exception is relatively light, then the IO and function calls from an exception will cost a lot compared to any computation. However, given that you only got 11k exceptions in 45 minutes, that's 4 a second. Certainly this is bad, but assuming no other applications are also performing a lot of IO, then this will not block your job too badly.
Arcymag
  • 1,037
  • 1
  • 8
  • 18
  • We do have a lot of Batch jobs running simultaneously. The server in question is also acting like a DB server (oracle). Hence a LOT of IO is happening concurrently. Thanks for the explanation though. – Aditya Jul 18 '12 at 04:41
1

Obvious response to this will be

  1. A lot of things your batch job is trying to do, is most probably not getting done.
  2. But you never know, a insane developer will try to fix things in a catch block and just eat up the exception (false negative)
  3. If the code has been working until now and just started to throw exceptions, possibility may be that your data set has changed OR developer is throwing the correct exceptions.
Siddharth
  • 9,349
  • 16
  • 86
  • 148
  • The Batch Job is working just fine. We were asked to do some performance monitoring when I noticed this. It is eating up CPU resources. These exceptions are not show stoppers. Thanks for the reply though. – Aditya Jul 18 '12 at 04:41
  • So yeah, the batch job is trying to do all this crap, but no results, so you are wasting cpu cycles. Instead just catch the error on the ui. – Siddharth Jul 18 '12 at 04:53