I have a db.m4.large RDS instance running on production. I am experiencing frequent temporary drops in Freeable Memory and Swap Usage and I am trying to figure out why.
Here are screenshots of some of the data I can see in CloudWatch.
Freeable Memory - 1 week period
Number of DB connections - 1 week period
Number of received client requests - 1 week period
I first noticed that the spikes almost always occur at the same time in the day and only on weekdays. I checked whether those times are inside the maintenance and backup windows one can set for an RDS instance, but they aren't. I then looked for any cron job that could trigger those spikes but could not find any. All the cron jobs I have run regardless of the day being a weekday or during a weekend. Also, they run every minute or every hour, which does not correlate with the data I can visualize in CloudWatch. There is only one job that runs on a daily basis, but it restores and uses an already created snapshot of the RDS instance. Plus, that job also runs everyday, weekdays and weekend.
I noticed the number of connections to the database drops when those spikes occur, but I can't tell if there is a relation since the number of connections constantly varies.
I took a look at the logs of my application but could not find any unusual activity when the spikes occur. All the client requests received when the spikes occur are requests that are constantly received during the day. Traffic-wise, the number of client requests received when the spikes occur is generally at its lowest level of the day (it usually occurs at 4:30am).
Finally, I obviously took a look at my slow query log (whose threshold is set to 2 seconds) but there isn't a single entry.
I contacted the AWS Support, asking them to take a look at the hardware of the machine and they told me the hardware seems totally fine. They also confirmed the correlation between Freeable Memory and Swap Usage and pointed me to some documentation related to how MySQL/InnoDB handles memory tables and internal temporary tables. They added that my instance may be keeping some temporary and/or internal tables and clean them up once the related connection closes and the associated thread gets destroyed. However, considering the fact that those spikes only last for 3 minutes, I'm not sure about the relation here.
I'm starting to run out of ideas and I was wonderful if anyone has ever experimented such situation and/or would have suggestions?
Thanks in advance.