3

I have a long running Java process running in CentOs Machine. I have both info and error logs set-up properly. The process ran for longer time (18+ hours) and disappeared all of a sudden. There is no trace of error/exception (OutOfMemoryError/ OutOfDiskSpace Error). How to figure out what has really happened, as in why and how the process got killed?

These are the OS details.
CentOS release 5.11 (Final)
Kernel \r on an \m

Are there any standard system logs or commands to figure out? This job is running in a servlet in Tomcat. Tomcat is also going down mysteriously.

piet.t
  • 11,718
  • 21
  • 43
  • 52
Vineel
  • 1,630
  • 5
  • 26
  • 47
  • You might want to go through this question: https://stackoverflow.com/questions/37497680/reason-for-sudden-jvm-crash – Himanshu Bhardwaj Feb 18 '19 at 06:40
  • Well I think doesn't help: you need to gather more info from the system. Let the process talk a little more, so you see when it stops, monitor it.... (btw how do you know, that it gets killed?) – kai Feb 18 '19 at 06:49

1 Answers1

5

Your process is mostly likely killed because the system runs out of memory. When it happens it first tries to kill short-running processes instead of long running ones. OOM Killer is unlikely to be logged in your application logs.

Check dmesg and try to find there info about killing <java_pid>.

Here is how "badness" of a task to kill determined in Linux https://www.kernel.org/doc/gorman/html/understand/understand016.html#toc21 :

badness_for_task = total_vm_for_task / (sqrt(cpu_time_in_seconds) *
sqrt(sqrt(cpu_time_in_minutes)))

Kernel steps through all running tasks.

Tomas F
  • 7,226
  • 6
  • 27
  • 36
Some Name
  • 8,555
  • 5
  • 27
  • 77