1

My EC2 t2.small instance CPU Utilization goes to 100% every day at the same time, roughly between 21:25 and 21:30 server time.

enter image description here

I have checked syslog and apache log and found nothing unusual during that time. Also, I have checked my cron jobs and system cron jobs and found no daily cron jobs running at that time (/etc/cron.daily is scheduled at 6:25 and executes correctly at that time according to logs).

Any ideas what could cause this behavior?

OS: Ubuntu 16.04

Milos Dakic
  • 141
  • 1
  • 6
  • Are you sure this isn't the 6:25 cron job, and the time zone difference between the server and the CloudWatch console is making it appear to be 21:25? – Mark B Jan 04 '19 at 16:50
  • Yes, I have double checked that. Daily cron is recorded in syslog at 6:25 every day, and spike is recorded in the same syslog filearound 21:26, so the time should not be the same. BTW, CloudWatch console shows spike time in my timezone as 2:25. – Milos Dakic Jan 04 '19 at 17:05
  • 1
    I don't know how anyone is going to be able to tell you what process is causing the CPU spike. You need to use something like `top` to capture the exact process that is using the CPU during that time. – Mark B Jan 04 '19 at 17:13
  • Yes, I have set up cron to execute top and save the output to a file during that time, just I have to wait until tomorrow to get the result. I thought that maybe someone had a similar experience and can offer some ideas on what could be usual suspect for this. – Milos Dakic Jan 04 '19 at 18:04

2 Answers2

0

After a lot of search, logging, trial and error, I have found apache2 was causing this. For some reason, the process hangs at 100% on some occasions, specifically when performing SSL test on ssllabs.com. Found this line in Apache error_log:

[ssl:error] [pid 29110] [client 64.41.200.104:58242] AH02042: rejecting client initiated renegotiation

Again, after some trial and error, the solution that fixed the issue was to update the SSLCipherSuite apache directive in /etc/apache2/mods-available/ssl.conf

Here is the value I used: SSLCipherSuite ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256

After this change, the problem never repeated. I hope this will help someone else in the same situation.

Milos Dakic
  • 141
  • 1
  • 6
0

My service ran into a similar issue (CPU spikes on a daily cron job). Turns out that logrotate was compressing a very large log file every morning using pbzip2. A code change was made to cut down on the spammy logs, resolving the issue.

Christopher Liu
  • 103
  • 1
  • 4