2

Currently, I have a Server A that is holding about 25 billion records (several terabytes size) with the following structure:

CREATE TABLE `table_x` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `a1` char(64) DEFAULT NULL, `b1` int(11) unsigned DEFAULT NULL, `c1` tinyint(1) DEFAULT NULL, `LastUpdate` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), PRIMARY KEY (`id`), UNIQUE KEY `idxb1a1` (`b1`,`a1`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As the data is growing too big, I am trying to migrate these records into Server B with the same schema structure using bulk insert of 10K records (e.g INSERT INTO yourtable VALUES (1,2), (5,5), ...;) in asecending order by the column id.

Initially, the insertion rate was really quick - however, it gradually slowed down and now takes about 10 secs to bulk insert 10K records (i.e 1K/sec). I am guessing its because it needs to update the indexes after every insertion.

I have done the following configuration on Server B before starting the migration :

  • innodb_flush_log_at_trx_commit=2
  • SET unique_checks=0;
  • autocommit=0 and commit every 50K

Server B hardware configuration :

  1. 300GB+ ram (240GB used for innodb_buffer_pool_size)
  2. SSDs for data storage

Server B my.cnf :

innodb_buffer_pool_size=240G
innodb_buffer_pool_instances=64
innodb_page_cleaners=32

innodb_purge_threads=1

innodb_read_io_threads=64
innodb_write_io_threads=64
innodb_use_native_aio=0
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_autoinc_lock_mode=2
innodb_file_per_table=1
max_connections=10000
skip_name_resolve=1

tmp_table_size=134217728
max_heap_table_size=134217728

back_log=1000
wait_timeout=900

innodb_log_buffer_size=32M
innodb_log_file_size=768M

Is there anything else I can do or configure to speed up the insertion?

Update #1 :

The reason why I am trying to migrate the records over to Server B is because I would like to break/shard the data into few servers (to use MariaDB SPIDER engine sharding solution). As such, solutions that involved sending a snapshot of the data or directly copying over the data doesn't seem viable.

  • You are guessing the indexes are the cause; maybe try removing the idxb1a1 index, and readding it after all the data is loaded? (Which will copy the full table so will need ample extra disk space). While you are migrating seems like a great chance to upgrade to utf8mb4, by the way. – ysth May 20 '20 at 07:01
  • @ysth will dropping the index and rebuilding take longer than the actual migration? – GitPirateFlare May 20 '20 at 07:09
  • Have you considered a file based copy? (E.g. shut the server down&copy the data dir/use a backup/...) – Solarflare May 20 '20 at 07:53
  • @Solarflare shutting down is not possible because the server is production, and I would like to reduce the records size in the table (which means I can't simply copy and paste). – GitPirateFlare May 20 '20 at 08:43

3 Answers3

1

The reason it slows down is likely because your transaction log gets full and the purging isn't keeping up. Increasing innodb_log_file_size (requires shutdown with innodb_fast_shutdown=0 and removing the logs) and innodb_log_files_in_group will postpone the slowdown. Increasing innodb_io_capacity and innidb_io_capacity_max to match what your storage can achieve should help.

Why don't you use xtrabackup to take a point-in-time copy and replication to finish the sync? That will be orders of magnitude faster than INSERT-ing mysqldump style.

Gordan Bobić
  • 1,748
  • 13
  • 16
  • @Gordon Bobic the reason why I dont use a snapshot feature (like xtrabackup) is because actually Server B will hold a subset of the data, while Server C will hold the remaining of the data. thanks for the suggestion on the configuration - I will try them out. – GitPirateFlare May 20 '20 at 10:07
  • You may want to look at: 1) partitioning 2) mysqldump's --where option. 3) MariaDB's spider engine – Gordan Bobić May 20 '20 at 10:12
  • @Gordon Bobic 1) partitioning is not possible due to my schema having 2 unique indexes. 2) thanks, i'll explore and see if mysqldump --where option process is overall faster than the migration 3) yes, actually that is the precise reason why I am breaking these records down into smaller servers. this is so I can utilize MariaDB Spider engine – GitPirateFlare May 20 '20 at 10:23
  • @GitPirateFlare - So you are setting up "sharding"? What determines the split between the machines? – Rick James Jun 02 '20 at 03:35
  • @Rick James because of the db schema, i had to do a workaround for mariadb spider solution to work. the spider will have the unique key constraint `idxb1a1` removed so that I can set up partitioning (that points to the data servers) using `id` (range partitioning) - however, at the data server, they'll retain the unique key constraint – GitPirateFlare Jun 04 '20 at 04:57
1

in addition to the answer from @Gordon-bobic removing the indices and reapplying at the end speeds things up a lot.

snitch182
  • 723
  • 11
  • 21
  • my concern is that the rebuilding of the index will take unimaginably long because of the data size (in terabytes range) – GitPirateFlare May 20 '20 at 10:24
  • yes, it will, but doing it incremenally will take more time overall - after all you have to write terabytes of data. There is no way around that. A spotifiy database migration i heard of took a year ... and they still lost 0.2 % of the data ... was a talk at codetalks 2019 – snitch182 May 20 '20 at 14:11
0

Agree with @gordan-bobić regarding the use of xtrabackup. If you are applying for a data migration, using physical copy/backup is your best approach if you want speed. Using logical copy such as using mysqldump or a query-based copy can take much time because it has to apply checks based on the configuration set loaded during runtime and done dynamically. However, if bulk insert is your option, then consider adjusting the innodb_autoinc_lock_mode = 2 if this is applicable.

Another thing is that, if you are using INSERT statement, are you loading it one value at a time? Consider using the multiple-value lists as this is proven to be faster. On the other hand, consider also using LOAD DATA instead of the INSERT statement.

Also have your innodb_change_buffering=insert (since you are using unique_checks=0 in MariaDB) and consider increasing the innodb_change_buffer_max_size for example from 30 to 50. It is most likely best to do this when there's not much activity on the target table you are inserting, so that you can monitor such activity on your target database server. Also consider that there's not much disturbance from other applications or daemons running on this target server as well.

Severalnines
  • 106
  • 5