3

Our company is using TokuDB on production and we are having a lot of problems trying to mitigate lag on our slave. Is very strange, because we're talking about very few rows... but with a few data it gets lagged.

Slave is a read-only DB.

For more information, we are using:

CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (4 cores)

RAM: 16Gb

HDD: 2Tb ST2000DM001 (EXT4 filesystem)

Here you can see some I/O performance outputs. I paste it outside of this post, because I think that of this way it will easier for reading.

iostat -x 1 output, when we have a lag situation http://paste.laravel.com/bjv

fio, for disk I/O: http://paste.laravel.com/bjG

We did a few disk tweaks, extracted from Steven Corona's book http://www.scalingphpbook.com:

  • Changed I/O scheduler to noop.
  • Turned off file system access times, noatime and nodiratime in /etc/fstab
  • Increased the number of open files, in /etc/security/limits.conf:
* soft  nofile  999999
* hard  nofile  999999

Some tweaks we have made in config:

MASTER

# * Query Cache Configuration
query_cache_limit               = 0
query_cache_size                = 0
query_cache_type                = 0
innodb_file_per_table           = 1

innodb_file_format              = barracuda
innodb_flush_method             = O_DIRECT
innodb_flush_log_at_trx_commit  = 1
innodb_log_file_size            = 128M

innodb_buffer_pool_size         = 13500M
innodb_read_io_threads          = 16
innodb_write_io_threads         = 16
innodb_io_capacity              = 180
innodb_thread_concurrency       = 4

SLAVE

# * Query Cache Configuration
query_cache_limit               = 0
query_cache_size                = 0
query_cache_type                = 0
innodb_file_per_table           = 1

innodb_file_format              = barracuda
innodb_flush_method             = O_DIRECT
innodb_flush_log_at_trx_commit  = 2
sync_binlog                     = 1

innodb_log_file_size            = 128M
innodb_buffer_pool_size         = 13500M
innodb_read_io_threads          = 16
innodb_write_io_threads         = 16

innodb_io_capacity              = 180
innodb_thread_concurrency       = 4
blacksoul
  • 252
  • 7
  • 22
  • You've not said how your filesystems are configured. Do you really get faster I/O with noop compared with deadline? Is your slave used as a failover standby or as a read-only db? Have you tried optimizing these settings? (innodb_thread_concurrency=4 seems a bit low - how many tables do you normally have open) – symcbean Nov 19 '12 at 11:14
  • I read that for Non-Percona setups should set innodb_thread_concurrency as 1x the number of CPU cores. Slave is working as read-only DB and filesystem are EXT4 – blacksoul Nov 19 '12 at 11:23
  • 1
    Would want a lot more info about your filesystems - what journalling mode? You might want to consider disabling barriers if you are using data=ordered and can enable async commit and checksums (and even if not - since you have a near replica). Or you might want to use XFS (with barriers enabled). Going back to ext4 increasing the commit time from it's default will also help with throughput. Using master-master replication with node affinity gives you a lot more flexibility for backups / schema changes / failover. – symcbean Nov 19 '12 at 12:12
  • I disabled barriers and used data=ordered yesterday, also enabled async commits and checksums and I have to say that we have a better performance now. We have passed of having 20-40secs of lag to situations to have 'only' 1-5 secs. However, this is not enough for us, because now we don't have so much traffic and still having some lag. I think we can to improve it with some tweaks like those. – blacksoul Nov 20 '12 at 12:44
  • 1
    Glad to hear your making progress. Ext3 certainly doesn't scale well with multiple writers - but I ext4 might - see http://www.mysqlperformanceblog.com/2012/03/15/ext4-vs-xfs-on-ssd/ beyond that there's query tuning and more hardware (SSDs / BBC disk controllers with fast disks) unless your seeing high load, in which case you might want to tinker with the task scheduler. Alternatively you might want to look at parallel write replication (percona / Galera / Tungsten) although I suspect that's not going to have a huge impact here. – symcbean Nov 20 '12 at 16:35
  • Yesterday I migrated to MariaDB and in the last 48 hours we only have 2 'lag situations'. Both of just 1 second of lag. So I think that we could assume that it has been mitigated :) Thanks you so much! – blacksoul Nov 21 '12 at 14:50

1 Answers1

1

I think this is already solved, due to @symcbean's help.

I disabled barriers, set data=ordered, used async commit and checksums. Finally, we migrated to MariaDB, but still having TokuDB

I forget to mention that we also set tokudb_cache_size = 8G, as TokuDB recommends, 50% of physical memory

blacksoul
  • 252
  • 7
  • 22