1

Context

I use and manage* a windows server at work. It is used for general computations by up to 5 users at a time via RDP. The server has 128GB of RAM which is sufficient for the work we do on it. However, now and then one of the processes eats up almost all memory due to a mistake in a script of the user (e.g. an erroneous array initialization, forgetting to remove a variable).

When that happens, all RDP connections are dropped and the server is uncontrollable until the memory usage is reduced or the server is restarted. The latter is a last resort as that leads to data loss for all users. I'm not sure what the exact memory threshold is before this "crash" happens, but it's somewhere in the 97% region.

Things that I've tried

Commands that do work

While the server is under heavy load I can still get a response from it with below commands

  • ping: works normally
  • tasklist /s servername: returns data, but is very slow. It does allow me to find the offending PID and session ID.
  • Enter-PSSession servername: works, but only starts a session after a very long time

Commands that do not work

I've tried below commands to kill the offending process and regain control. Unfortunately none of them worked within 10-15 minutes.

  • taskkill /s servername /pid pid /f: does nothing and stops after 10-15 minutes with a message about an internal error
  • pskill \\servername pid: does nothing, stopped it manually
  • logoff sessionID /server:servername: does nothing, stopped it manually

Question

How can I kill the memory eating process quickly when the server is at ~97% of it's memory and does not respond to above commands?

*Corporate IT manages the server overall, but I manage periodic updates, user management, and software installations.

  • do you have any idea as to how often it happens? is so, you could use a scheduled task to check at less-that-that-interval to see if RAM use is growing beyond a safe threshold. i would not wait for `97%`, tho ... [*grin*] – Lee_Dailey Jun 18 '20 at 21:45
  • It happens once, maybe twice, a month, at random times during the day. When it occurs, the memory use can grow and hit the limit in a few seconds, especially if at that moment multiple users are using the memory significantly already. While this question is focussed on killing the offending process, I would also be happy with an answer that would focus on lowering the max RAM a system can use overall, such that there's always some left for RDP connections. – Saaru Lindestøkke Jun 19 '20 at 21:25
  • you may want to make a new Question that focuses on the actual problem ... how to limit the RAM used on an RDP server by any given app/process/user-session. i don't know the answer to that ... – Lee_Dailey Jun 20 '20 at 01:00

1 Answers1

0

This sounds like a memory leak that needs to be diagnosed/resolved by updating the offending software.

Until then, you can try a few things to keep it under control.

If you have identified the source of the problem, you can set up a scheduled task running as SYSTEM to terminate/restart the problematic process/service at intervals/times that will have the least amount of disruption.

When dealing with a similar situation recently with low memory caused by our AntiVirus product, I also noticed remote management tools would fail to connect. With products like SCCM or SCOM running, you can still pass commands through those agents.

Another option that worked was using the PowerShell Get-WmiObject Terminate() method on the Win32_Process class to kill a remote process.

There is a great article demonstrating how to do this with WMI here.

twconnell
  • 902
  • 5
  • 13
  • Thanks for taking the time to write an answer. I might've not explained it properly, but I understand a memory leak as an unintended increase in memory by a process which can be resolved through updates/patches. In my case the memory increase is intended (a user wants to load a 90GB text file), but they do it in a sub-optimal way by loading it all into memory at once while others are using the system as well. I don't think user error can be resolved by updates of the system. – Saaru Lindestøkke Jun 21 '20 at 09:39
  • A periodic scheduled task would also not work because that would seriously disrupt the users work. These are all python and matlab processes and it's impossible to identify up front which PID needs a pre-emptive restart to prevent memory overuse. Restarting all python/matlab processes periodically would seriously disrupt the work of other colleagues. I'll look into SCCM, WCOM and WMI, thanks for those suggestions. – Saaru Lindestøkke Jun 21 '20 at 09:39