Background:
I have a live Django app that utilizes 4 Redis instances.
The first two are big in size: back ups amount to ~2GB and ~4.4GB respectively. The other two are small: ~85M and ~15M.
redis-server --version
yields Redis server v=4.0.2 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=401ce53d7b0383ca
.
The problem:
It's a busy server running PostgreSQL 9.6.5 as well. PG data and Redis backups are both saved in the secondary drive xvdb
.
I've noticed that whenever my big Redis instances start backing up, disk I/O naturally spikes and PostgreSQL commit statements start piling up in the slow log. Behold:
21:49:26.171 UTC [44861] ubuntu@myapp LOG: duration: 3063.262 ms statement: COMMIT
21:49:26.171 UTC [44890] ubuntu@myapp LOG: duration: 748.307 ms statement: COMMIT
21:49:26.171 UTC [44882] ubuntu@myapp LOG: duration: 1497.461 ms statement: COMMIT
21:49:26.171 UTC [44893] ubuntu@myapp LOG: duration: 655.063 ms statement: COMMIT
21:49:26.171 UTC [44894] ubuntu@myapp LOG: duration: 559.743 ms statement: COMMIT
21:49:26.172 UTC [44883] ubuntu@myapp LOG: duration: 1415.733 ms statement: COMMIT
As a consequence, this is how my PostgreSQL commits look like every day:
The question:
Is there anything I can do on the Redis side to help smoothe out this spikey situation? I'd like Redis and PostgreSQL to live in as much harmony as they possibly can on a single machine.
More information:
Ask for more information if you need it.
Machine specs:
AWS EC2 m4.4xlarge (16 cores, 64GB RAM)
Elastic Block Store gp2 volumes (105 IOPS, burst upto 3000 IOPS)
The following config exists in the Append Only Mode
section of my Redis conf files:
appendonly no
appendfilename "appendonly.aof"
# appendfsync always
appendfsync everysec
# appendfsync no
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble no
Typical iostat -xmt 3
values are:
10/15/2017 08:28:35 PM
avg-cpu: %user %nice %system %iowait %steal %idle
10.44 0.00 0.93 0.15 0.06 88.43
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 2.00 0.00 0.04 38.67 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 2.67 0.00 44.67 0.00 0.41 18.99 0.13 2.81 0.00 2.81 1.07 4.80
Compare that to the same around the time slow commits are logged:
10/15/2017 10:18:11 PM
avg-cpu: %user %nice %system %iowait %steal %idle
8.16 0.00 0.65 11.90 0.04 79.24
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 4.00 0.00 1.00 0.00 0.02 48.00 0.00 1.33 0.00 1.33 1.33 0.13
xvdb 0.00 0.00 1.67 1312.00 0.01 163.50 254.90 142.56 107.64 25.60 107.75 0.76 100.00
The first Redis instance has the following snapshotting config:
save 7200 1
#save 300 10
#save 60 10000
The second Redis instance has the following snapshotting config:
save 21600 10
#save 300 10
#save 60 10000