I recently had an issue where my server powered off in the middle of running a script, seemingly randomly, but at about the same point each time, and then whenever I tried to power the server on again it would start the start up process and then when it got to a certain point just power itself off again before reaching the login options.
I initially thought it was something to do with the script and packages being installed, but taking a look at the security and CIS Benchmark documentation I found that the OS is installed with Level 2- Server configurations, to meet security requirements.
For what I need to do on my own server, I can apply a work around and edit the auditd.conf file and change this setting to be able to do what I need. However, in the production environment this workaround may not be appropriate or an allowed option.
I have a couple questions about this:
Can anything be done when the server reaches this state, as you can't even log on at this point, or is the only option to reimage server? (This is what I've been resorting to)
Presumably (I'm still trying to understand all the config options), this should not happen and the logs have some sort of rotate and retention policies, and I've just hit a corner case where what I need to do ends up filling these logs beyond what is the expected use case?