I have a Drupal application which has been running on a single MySQL database server for 12 months, and has been performing relatively well (apart from peak load events). We needed to be able to support much higher spikes than the current DB server allowed, and at 32GB there was not much gain to be had from simply vertically scaling the single DB server.
We decided to set up a new MariaDB Galera cluster with 2x 32GB instances. We matched the configuration as far as possible with the soon-to-be-obselete DB server.
After migrating to the new database servers, we noticed that the CPU usage on those instances was constantly at 100%, and load was steadily increasing. Over the course of 1 hour, load average went from 0.1 to 150.
Initially we thought it might have something to do with the synchronisation between servers, but even with 1 server turned off and no sync occurring the it was still maxing out CPU as long as the web application was making requests to it.
After a lot of experimentation I found that reducing a few of the configuration options had a profound effect on the CPU usage and load. After making the below changes, the load average has stabilised between 4 and 6 on both instances.
The questions
- What are some possible reasons for such a dramatic difference in CPU usage between the old and new servers, despite essentially migrating the configuration from the old server?
- Load is currently hovering between 4 and 6 (and this is a low traffic period for our website). What should I be looking at to try and reduce this value, and ensure that when the site gets hit with some real traffic it wont fall over?
Config changes
innodb_buffer_pool_instances
- Original value: 500 (there are 498 tables total in all databases)
- New value: 92
table_cache
- Original value: 8
- New value: 4
max_connections
- Original value: 1000
- New value: 400
Current configuration
Here is the full configuration file from one of the servers /etc/mysql/my.cnf
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=1
bind-address=0.0.0.0
max_connections = 400
wait_timeout = 600
key_buffer_size = 16M
max_allowed_packet = 16777216
max_heap_table_size = 512M
table_cache = 92
thread_stack = 196608
thread_cache_size = 8
myisam-recover = BACKUP
query_cache_limit = 1048576
query_cache_size = 128M
expire_logs_days = 10
general_log = 0
max_binlog_size = 10485760
server-id = 0
innodb_file_per_table
innodb_buffer_pool_size = 25G
innodb_buffer_pool_instances = 4
innodb_log_buffer_size = 8388608
innodb_additional_mem_pool_size = 8388608
innodb_thread_concurrency = 16
net_buffer_length = 16384
sort_buffer_size = 2097152
myisam_sort_buffer_size = 8388608
read_buffer_size = 131072
join_buffer_size = 131072
read_rnd_buffer_size = 262144
tmp_table_size = 512M
long_query_time = 1
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
# Galera Provider Configuration
wsrep_provider=/usr/lib/galera/libgalera_smm.so
#wsrep_provider_options="gcache.size=32G"
# Galera Cluster Configuration
wsrep_cluster_name="xxx"
wsrep_cluster_address="gcomm://xxx.xxx.xxx.107,xxx.xxx.xxx.108"
# Galera Synchronization Congifuration
wsrep_sst_method=rsync
#wsrep_sst_auth=user:pass
# Galera Node Configuration
wsrep_node_address="xxx.xxx.xxx.107"
wsrep_node_name="xxx01"
[mysqldump]
quick
quote-names
max_allowed_packet = 16777216
[isamchk]
key_buffer_size = 16777216