0

I'm trying to understand why a server (AWS Ubuntu) sometime becomes unresponsive and eventually has to be restarted.

While looking through top logs, I found a process that I think might be causing the issue. When it started the load went from about 3 to 300 in a few minutes and then crashed.

However when looking at the logs, I'm not sure sure how to interpret the results. It looks like the memory usage went from 5GB to 5KB so I guess this is good? Is it possible that this kind of process could bring down the server? Or am I looking at the wrong one?

This is the top output for this process over time:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

 1706 www-data  35  15 10.875g 5.181g    100 D   6.2 70.9   8:35.43 php ......

 1706 www-data  35  15 10.875g 3.076g    808 R   0.0 42.1   8:37.37 php ......

 1706 www-data  35  15 10.875g 236608   1132 D   0.0  3.1   8:37.45 php ......

 1706 www-data  35  15 10.875g  26408    736 D   0.0  0.3   8:37.54 php ......

 1706 www-data  35  15 10.875g  11268    628 D   0.0  0.1   8:37.63 php ......

 1706 www-data  35  15 10.875g  10308    516 D   0.0  0.1   8:37.72 php ......

 1706 www-data  35  15 10.875g   8116    360 D   0.0  0.1   8:37.78 php ......

 1706 www-data  35  15 10.875g   6728    688 D   0.0  0.1   8:37.84 php ......

 1706 www-data  35  15 10.875g   5840    484 D   0.0  0.1   8:37.90 php ......

 1706 www-data  35  15 10.875g   5852    644 D   0.0  0.1   8:37.97 php ......

Edit: I have put here the last available log before the server crashed: https://pastebin.com/ZM1XiUid Is there anything in there that could have caused the crash?

laurent
  • 179
  • 3
  • 12

1 Answers1

2

There are several things you can notice from the top output:

  1. One of the shown processes is using more than 5GB of memory 5.181g.
  2. There are several processes in an uninterruptible sleep mode D. This mode means the process is waiting for something (most likely I/O like disk operation).

If you are getting too many of these processes (D state), your server load will become too high. In order to fix this, you need to know what these processes are doing or waiting for. It could be too many requests hitting a slow disk.

Such a situation may not bring the whole server down, but at least it will make it unstable or unresponsive.

This post explains what an uninterruptible sleep state is.

Khaled
  • 36,533
  • 8
  • 72
  • 99
  • Thank you, I hadn't noticed these D states, I will have a look at it. By the way, I have put the last log (before the server crashed) there: https://pastebin.com/ZM1XiUid Is there anything in the there that could show the cause of the crash? – laurent Mar 29 '17 at 12:39