This server runs several processes of satellite imagery, it has 256GB of RAM, 12TB disk, 64 CPU cores Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz. It should not fail with this charge but it does sometimes. This is a screen capture of a typical htop.
When the system fails I can capture its last console message using the IPMI remote control. The last one is this:
With systemd failing to provide these services, the server is unable to work and we can't enter to fix it by ssh, we have to hard reset it. What should we do to prevent this problem?
EDIT: The server has one disk M.2 240GB for the operating system in / and the 12TB disk for /data. The system is ...
Linux tsom02 5.10.0-12-amd64 #1 SMP Debian 5.10.103-1 (2022-03-07) x86_64 GNU/Linux
The M2 is partitioned with only 28GB for /. Maybe that is the reason? Should I use more space for /?
The output of vmstat 5 5 is: