The Short: We're hitting max connections and seeing very poor performance from MySQL during peak business hours and cannot determine if it is a load problem or a configuration problem.
The Long: To begin, here’s our server specs hosted on Digital Ocean:
Cores: 20
Memory:
total used free shared buff/cache available Mem: 62G 48G 4.7G 216M 10G 14G
MySQL WorkBench Dashboard Snapshot:
We typically have over 200 entries in processlist, almost all of which are "Opening tables", "closing tables", and, slightly less frequent, "cleaning up". The cleaning up entries have COMMAND killed
and NULL
INFO (SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST
).
The only queries we really see taking very long are some INSERTS into tables with heavy use, and UPDATE commands with the primary key in the WHERE for tables with heavy us. Their state is “update” in the processlist.
MySQL .cnf settings:
key_buffer_size=1G
myisam_sort_buffer_size=1073741824
max_length_for_sort_data=8388608
max_sort_length=8388608
sort_buffer_size=1073741824
join_buffer_size=1073741824
preload_buffer_size=1073741824
read_buffer_size=1073741824
read_rnd_buffer_size=1073741824
thread_cache_size=450
max_allowed_packet=100M
max_connections=400
lock_wait_timeout=50
wait_timeout=120
table_open_cache_instances=10
table_open_cache=6000
table_definition_cache=6000
innodb_flush_method=O_DIRECT
innodb_buffer_pool_size=46G
innodb_buffer_pool_instances=46
We have 800+ databases on our single live server, and many thousands of users using an app that makes frequent server requests with lots of individual record updates or inserts that need to sync to multiple users.
The Question:
Given the .cnf settings and our server specs, in relation to the MySQL WorkBench dashboard snapshot, should we be experiencing the kind of slowdown/connections we're seeing with this configuration?
We constantly hit our Max Connections and get SQLSTATE[HY000] [2002] Resource temporarily unavailable
errors in PHP from PDO.
Does anyone have any suggestions to improve the performance? Different config settings? We know long term solutions, like putting large clients on separate servers, among others—but for the short term, is there any way to optimize our server? Our app and website are barely usable during the rush hours.
Additional Stat:
Linux 4.0.4-301.fc22.x86_64 04/25/2016 _x86_64_ (20 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
8.51 0.01 0.67 0.28 0.35 90.18
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
vda
269.78 1681.90 2180.52 37679232421 48849706700