Investigating & Preventing DNS Service Crash

Question

I am running a VPS on digital ocean droplet and facing DNS Service crashes since last three weeks. First time, it happened in late May, site wasn't resolving to IP and wasn't accessible. And same thing happened yesterday, and I wasn't aware of that until I opened the site, unfortunately it was down for about 24 hours. :(

There are around 3500 messages within very short duration (3-5 minutes) before crash. Here are few entries related to killing named service:

Jun 17 09:33:49 server kernel: Out of memory: Kill process 31644 (/usr/sbin/amav
i) score 21 or sacrifice child
Jun 17 09:33:49 server kernel: Killed process 31644 (/usr/sbin/amavi), UID 990, 
total-vm:376848kB, anon-rss:108kB, file-rss:0kB, shmem-rss:0kB
Jun 17 09:33:49 server kernel: php-cgi invoked oom-killer: gfp_mask=0x201da, ord
er=0, oom_score_adj=0
Jun 17 09:33:49 server kernel: [<ffffffffb89c0e84>] oom_kill_process+0x254/0x3e0
Jun 17 09:33:49 server kernel: [<ffffffffb89c092d>] ? oom_unkillable_task+0xcd/0
x120
**Jun 17 09:33:52 server kernel: Out of memory: Kill process 2539 (named) score 19
 or sacrifice child
Jun 17 09:33:52 server kernel: Killed process 2539 (named), UID 25, total-vm:235
396kB, anon-rss:12kB, file-rss:0kB, shmem-rss:0kB**
Jun 17 09:33:53 server systemd: amavisd.service: main process exited, code=kille
d, status=9/KILL
Jun 17 09:33:53 server systemd: named.service: main process exited, code=killed,
 status=9/KILL
Jun 17 09:33:54 server systemd: lfd.service: main process exited, code=killed, s
tatus=9/KILL

Additional Info:

Running CentOS 7 with CWP
Service was crashed because server was out of memory.
Lookslike attacker was targeting php-cgi, imap-login and imap service (I assume because of too many entries in log related to those services)

Questions:

How do I prevent this?
How can I found out the root cause? (I know, logs, but what type of entries should I look for and in which file?)
Is there anyway to get notified in situation like this? (to minimize downtime)

Your problem is nothing specific to the "DNS" service. If you look at your logs, it says it killed amavisd too, and lfd. What happens is that when there is no memory anymore, the kernel start to find process to kill, so any process can be killed. You need to find out what eats all memory and address that, at least by fixing the maximum amount of memory for the given process that misbehaves. — Patrick Mevzek, Jun 18 '20 at 14:19
Also, about " and I wasn't aware of that until I opened the site, unfortunately it was down for about 24 hours. :(" in a business setting, any critical asset, including a website, should be monitored. Either doing that yourself (but not from the same box otherwise the monitoring could die) or using any provider offering monitoring services — Patrick Mevzek, Jun 18 '20 at 14:20
If there were 3500 log entries related to this, then you should look through them for anything unusual. — Michael Hampton, Jun 18 '20 at 15:21
@PatrickMevzek any tip for tracing down the process which eats all memory? — Alena, Jun 20 '20 at 04:58
It is almost impossible to trace things back in time, except if you planned already in advance to have some monitoring that records your server state regularly. You should install some monitoring solution, and do a full assessment on what runs on your server. You should consult your logs, like webserver logs, to find any suspicious quries. You should make sure all your OS and applications are up to date. I hope also that you have backups because maybe your server has been compromised. — Patrick Mevzek, Jun 20 '20 at 05:31
which monitoring service / app would you recommend for this? — Alena, Jun 20 '20 at 06:05

Investigating & Preventing DNS Service Crash

0 Answers0