AWS RDS: random spikes in Freeable Memory and Swap Usage

Question

I have a db.m4.large RDS instance running on production. I am experiencing frequent temporary drops in Freeable Memory and Swap Usage and I am trying to figure out why.

Here are screenshots of some of the data I can see in CloudWatch.

Freeable Memory - 1 week period

Swap Usage - 1 week period

Number of DB connections - 1 week period

Number of received client requests - 1 week period

I first noticed that the spikes almost always occur at the same time in the day and only on weekdays. I checked whether those times are inside the maintenance and backup windows one can set for an RDS instance, but they aren't. I then looked for any cron job that could trigger those spikes but could not find any. All the cron jobs I have run regardless of the day being a weekday or during a weekend. Also, they run every minute or every hour, which does not correlate with the data I can visualize in CloudWatch. There is only one job that runs on a daily basis, but it restores and uses an already created snapshot of the RDS instance. Plus, that job also runs everyday, weekdays and weekend.

I noticed the number of connections to the database drops when those spikes occur, but I can't tell if there is a relation since the number of connections constantly varies.

I took a look at the logs of my application but could not find any unusual activity when the spikes occur. All the client requests received when the spikes occur are requests that are constantly received during the day. Traffic-wise, the number of client requests received when the spikes occur is generally at its lowest level of the day (it usually occurs at 4:30am).

Finally, I obviously took a look at my slow query log (whose threshold is set to 2 seconds) but there isn't a single entry.

I contacted the AWS Support, asking them to take a look at the hardware of the machine and they told me the hardware seems totally fine. They also confirmed the correlation between Freeable Memory and Swap Usage and pointed me to some documentation related to how MySQL/InnoDB handles memory tables and internal temporary tables. They added that my instance may be keeping some temporary and/or internal tables and clean them up once the related connection closes and the associated thread gets destroyed. However, considering the fact that those spikes only last for 3 minutes, I'm not sure about the relation here.

I'm starting to run out of ideas and I was wonderful if anyone has ever experimented such situation and/or would have suggestions?

Thanks in advance.

This was just announced for RDS performance issues: https://aws.amazon.com/about-aws/whats-new/2017/10/announcing-open-preview-of-performance-insights/ — John Hanley, Nov 04 '17 at 07:06
The graph points strongly suggest that your cause is a recurring job, but a job whose impact on the server has a periodic variation that you aren't catching. When index statistics shift, query plans can shift, and suddenly the optimizer takes a dramatically different approach to a routine query. I would expect it to show in the slow query log, though. You might want to verify that the slow query log is enabled and working if it is entirely empty. — Michael - sqlbot, Nov 04 '17 at 16:54
RDS Performance Insights seems awesome and would very probably give me a better idea of what the issue is. However, it seems it isn't available for me yet unfortunately (region not supported yet and I'm not using Aurora PostgreSQL). As for the query log, I ran a query that took more than 2 seconds to execute on purpose and could see it in slow query logs of my instance, so it seems to be working as expected. — moewb, Nov 06 '17 at 09:17

AWS RDS: random spikes in Freeable Memory and Swap Usage

0 Answers0