1

According to slow query log, the following query (and similar queries) would take around 2s to execute occassionally:

INSERT INTO incoming_gprs_data (data,type) VALUES ('3782379837891273|890128398120983891823881abcabc','GT100');

Table structure:

CREATE TABLE `incoming_gprs_data` (
 `id` int(200) NOT NULL AUTO_INCREMENT,
 `dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `data` text NOT NULL,
 `type` char(10) NOT NULL,
 `test_udp_id` int(20) NOT NULL,
 `parse_result` text NOT NULL,
 `completed` tinyint(1) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `completed` (`completed`)
) ENGINE=InnoDB AUTO_INCREMENT=5478246 DEFAULT CHARSET=latin1

Activities related to this table:

  1. Around 200 rows are inserted to this table every second. The incoming data is originating from different sources (thus, it does not happen in one process but multiple processed at every second).
  2. A cron process will process these rows by getting the rows via SELECT * FROM incoming_gprs_data WHERE completed = 0, process them, and update completed = 1
  3. Another cron process (runs every 15 minutes) will delete the completed rows (i.e. completed = 1) to make the table slimmer.
  4. Slow log query does not indicate any slow SELECT query related to the table.
  5. The size of the table is relatively small less than 200K rows.

The reason we are doing #2 and #3 because previously, we have discovered that deleting completed row took time because the index needs to be rebuilt. Therefore, we added the completed flag and perform the deletion less frequently. These changes help to reduce the number of slow queries.

Here are the innodb_settings that we have:

+---------------------------------+------------------------+
| Variable_name                   | Value                  |
+---------------------------------+------------------------+
| have_innodb                     | YES                    |
| ignore_builtin_innodb           | OFF                    |
| innodb_adaptive_flushing        | ON                     |
| innodb_adaptive_hash_index      | ON                     |
| innodb_additional_mem_pool_size | 8388608                |
| innodb_autoextend_increment     | 8                      |
| innodb_autoinc_lock_mode        | 1                      |
| innodb_buffer_pool_instances    | 2                      |
| innodb_buffer_pool_size         | 6442450944             |
| innodb_change_buffering         | all                    |
| innodb_checksums                | ON                     |
| innodb_commit_concurrency       | 0                      |
| innodb_concurrency_tickets      | 500                    |
| innodb_data_file_path           | ibdata1:10M:autoextend |
| innodb_data_home_dir            |                        |
| innodb_doublewrite              | OFF                    |
| innodb_fast_shutdown            | 1                      |
| innodb_file_format              | Antelope               |
| innodb_file_format_check        | ON                     |
| innodb_file_format_max          | Antelope               |
| innodb_file_per_table           | ON                     |
| innodb_flush_log_at_trx_commit  | 2                      |
| innodb_flush_method             | O_DIRECT               |
| innodb_force_load_corrupted     | OFF                    |
| innodb_force_recovery           | 0                      |
| innodb_io_capacity              | 200                    |
| innodb_large_prefix             | OFF                    |
| innodb_lock_wait_timeout        | 50                     |
| innodb_locks_unsafe_for_binlog  | OFF                    |
| innodb_log_buffer_size          | 67108864               |
| innodb_log_file_size            | 536870912              |
| innodb_log_files_in_group       | 2                      |
| innodb_log_group_home_dir       | ./                     |
| innodb_max_dirty_pages_pct      | 75                     |
| innodb_max_purge_lag            | 0                      |
| innodb_mirrored_log_groups      | 1                      |
| innodb_old_blocks_pct           | 37                     |
| innodb_old_blocks_time          | 0                      |
| innodb_open_files               | 300                    |
| innodb_purge_batch_size         | 20                     |
| innodb_purge_threads            | 0                      |
| innodb_random_read_ahead        | OFF                    |
| innodb_read_ahead_threshold     | 56                     |
| innodb_read_io_threads          | 4                      |
| innodb_replication_delay        | 0                      |
| innodb_rollback_on_timeout      | OFF                    |
| innodb_rollback_segments        | 128                    |
| innodb_spin_wait_delay          | 6                      |
| innodb_stats_method             | nulls_equal            |
| innodb_stats_on_metadata        | OFF                    |
| innodb_stats_sample_pages       | 8                      |
| innodb_strict_mode              | OFF                    |
| innodb_support_xa               | ON                     |
| innodb_sync_spin_loops          | 30                     |
| innodb_table_locks              | ON                     |
| innodb_thread_concurrency       | 0                      |
| innodb_thread_sleep_delay       | 10000                  |
| innodb_use_native_aio           | OFF                    |
| innodb_use_sys_malloc           | ON                     |
| innodb_version                  | 1.1.8                  |
| innodb_write_io_threads         | 4                      |
+---------------------------------+------------------------+

We have set our innodb_buffer_pool_size to 6G after calculating using the follow SQL query:

SELECT CEILING(Total_InnoDB_Bytes*1.6/POWER(1024,3)) RIBPS FROM (SELECT SUM(data_length+index_length) Total_InnoDB_Bytes FROM information_schema.tables WHERE engine='InnoDB') A;

And it generates the result of 5GB. We estimated that it won't exceed this size for our InnoDB tables.

Our primary concern right at the moment is on how to speed up the insert query into the table and what causes the occasional slow insert queries.

Heru S
  • 1,283
  • 2
  • 17
  • 28
  • Which version of MySQL are you using? I'm asking to see whether you have event scheduler functionality in the version you're using. If you have event scheduler available, you can use it instead of cron and have self-cleaning database. Apart from this, what Ollie Jones posted is the answer you're after - wrap multiple insertions in a single transaction. That way only 1 input output operation of the hard disk will be used to save multiple records. You'll trade disk bandwith for disk I/O. You have lots of bandwith but very, very limited I/O. That's why your inserts are slow. – N.B. Sep 03 '16 at 14:28
  • @N.B. Thanks! Version is MySQL 5.5 ya. – Heru S Sep 04 '16 at 03:20
  • In the end, per Bill Karwin suggestion below, we used RabbitMQ to resolve this problem as we are only using the table to queue the data for processing and RabbitMQ seems to be doing the job pretty well. – Heru S Jan 21 '19 at 02:23

5 Answers5

6

As you know, 200 rows a second of insertion is a lot. It is worth your trouble to try to optimize this data flow on an application of this scale.

InnoDB uses database transactions on all insertions. That is, every insert looks like this:

 START TRANSACTION;
 INSERT something...;
 COMMIT;

If you don't specify these transactions, you get autocommit behavior.

The secret to doing insertions at high volume is to do many of them in each transaction, like so:

 START TRANSACTION;
 INSERT something...;
 INSERT something...;
 ...
 INSERT something...;
 INSERT something...;
 INSERT something...;
 COMMIT;
 START TRANSACTION;
 INSERT something...;
 INSERT something...;
 ...
 INSERT something...;
 INSERT something...;
 INSERT something...;
 COMMIT;
 START TRANSACTION;
 INSERT something...;
 INSERT something...;
 ...
 INSERT something...;
 INSERT something...;
 INSERT something...;
 COMMIT;

I have had good success with up to one hundred INSERT commands before each COMMIT;

Do not forget the final COMMIT! Don't ask me how I know to give this advice. :-)

Another way to do this in MySQL is with multiple-row INSERT commands In your case they might look like this.

INSERT INTO incoming_gprs_data (data,type) VALUES
    ('3782379837891273|890128398120983891823881abcabc','GT100'),
    ('3782379837891273|890128398120983891823881abcabd','GT101'),
    ('3782379837891273|890128398120983891823881abcabe','GT102'),
       ...
    ('3782379837891273|890128398120983891823881abcabf','GT103'),
    ('3782379837891273|890128398120983891823881abcac0','GT104');

A third way, the hardest and the highest performance way, to get a very high insert rate is to store your batches of data in text files, and then use the LOAD DATA INFILE command to put the data into your table. This technique can be very fast indeed, especially if the file can be loaded directly from the file system of your MySQL server.

I suggest you try the transaction stuff first to see if you get the performance you need.

Another thing: if you have a quiet time of day or night, you can delete the completed rows then, rather than every fifteen minutes. In any case, when you read back these rows to process or to delete, you should use a transaction-batch process like this:

   done = false   /* pseudocode for your programming language */
   while not done {
       DELETE FROM table WHERE completed = 1 LIMIT 50;
       if that query handled zero rows {
           done = true
       }
   }

This will do your deletion operation in reasonably sized transactional batches. Your occasional two-second insertion delay is probably a result of a very large transactional batch on your processing or deletion.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • 1
    Great answer, which I'd expand with example on how to use event scheduler to have scheduled table-cleanup routine so that cron can be avoided. Reason why event scheduler instead of cron - less programs used, less things that can go wrong. Apart from that, great answer, +1! – N.B. Sep 03 '16 at 14:30
  • Thanks. I will try with the deletion at the quieter times instead of every 15 minutes. Also, I am thinking to write the data into a "log" first and execute the whole insert data at one time (say every minute). Is writing to a "log" file faster than inserting into the table directly? – Heru S Sep 04 '16 at 02:02
  • @N.B : How do you plan to do the event scheduler? Thanks. – Heru S Sep 04 '16 at 02:02
  • @HeruS - the query you run with cron gets executed by the event scheduler. I won't go into what event scheduler is, MySQL docs cover it and it's really, really easy to use. Also, writing to a log file could be faster but it comes with downsides. It's faster because OS schedules how the HDD is used and it will happily cache the data anywhere before committing to the drive (hence the "speed" but your data isn't really safely on the permanent storage). – N.B. Sep 04 '16 at 07:47
  • Thanks for info @N.B. Will definitely check on this feature. – Heru S Sep 05 '16 at 16:23
  • 100 per commit is somewhat optimal; and it is about 10 times as fast as 100 individual commits. And 1000 per commit is not noticeably faster (plus has issues with replication, etc) – Rick James Sep 05 '16 at 16:49
  • Ollie, I will disagree with that approach to the `DELETE`. It will get slower and slower as the next 50 `completed` items are later and later in the table. One is better walking through the table on the `PRIMARY KEY`. [_More details of efficient chunking_](http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks). – Rick James Sep 05 '16 at 16:52
  • Great article! Always happy to see my fellow performance evangelists in action :) – dkellner Apr 06 '20 at 14:18
1
  • Around 200 rows are inserted to this table every second. One at a time? Much better is to do a single multi-row INSERT.
  • Slow log query does not indicate any slow SELECT query related to the table. Lower long_query_time, the default of 10 seconds is virtually 'useless'.
  • Cron processes do SELECT * FROM incoming_gprs_data WHERE completed = 0.

Don't scan the entire table all at once. Walk through the table, preferably via the PRIMARY KEY, doing say 1000 rows at a time. More details on chunking.

The index is not "rebuilt", it is always incrementally updated. (I hope you are not explicitly rebuilding it!)

I assume you have at least 8GB of RAM? (The buffer_pool is my clue, which should be about 70% of available RAM.)

int(200) -- the (200) means nothing. An INT is 4 bytes regardless.

Don't do two cron processes; go ahead and delete on the first pass. The UPDATE to set completed is about as costly as the DELETE.

More

If you cannot "batch" the inserts, can you at least but them in a single "transaction" (BEGIN...COMMIT)? Ditto for the DELETEs. For data integrity, there is at least one disk hit per transaction. So, doing several operations in a single transaction decreases I/O, thereby speeding up the query. But... Don't get carried away; if you do a million inserts/deletes/updates in a single transaction, there are other issues.

Another thing that can be done to decrease the I/O overhead: innodb_flush_log_at_trx_commit = 2, which is faster, but less safe than the default of 1. If your "200 inserts/sec" is 200 transactions (such as with autocommit=1), this setting change can make a big difference.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I cannot do multi-row insert mainly because the data received to be inserted arrives at different events (but around 200 rows of data each second) :(. Slow log query has been set to 1 second. As for the scanning, we actually set a LIMIT to our query to around 1000. For the `UPDATE` better than `DELETE`, I found it at: http://stackoverflow.com/a/14299585/643366 and it works for us. – Heru S Sep 04 '16 at 01:59
  • 1
    I added some more advice. The SO link says some wrong things, so I am not sure how far to believe it. – Rick James Sep 04 '16 at 05:54
  • We cannot use `BEGIN` ... `COMMIT` mainly because the data is coming from multiple sources at one time :(. We have implemented `innodb_flush_log_at_trx_commit = 2` since we are OK to lose some data. We have come up with a different solution that is to make the table `ENGINE=MEMORY`. And, use the batch inserts for the processed data (point #2 in my question). – Heru S Sep 05 '16 at 16:31
  • One benchmark showed that using InnoDB for a staging table was actually faster than Memory; probably because of row-level versus table-level locking. See [_High Speed Ingestion_](https://mariadb.com/kb/en/mariadb/data-warehousing-high-speed-ingestion/) – Rick James Sep 05 '16 at 16:47
1

You posted in your own answer:

To resolve this, we alter the incoming_gprs_data table to use MEMORY engine. This table acts like a temporary table to gather all the incoming data from different sources. We will then using a cron will process these data, insert them into another table processed_data_xxx, and finally delete them. This removes all the slow insert queries.

You should use a message queue for this, not a database. If you have a workflow that processes data and then deletes it, this sounds perfect for a message queue. There are many message queues that can handle 200 entries per second easily.

Instead of a cron job to update and delete records from a database, you could just have an application listening to a topic on the message queue, process an item, and then... nothing. No need to store that item, just move on to the next item from the queue.

We use Apache ActiveMQ at my current company. I know other developers who recommend RabbitMQ as well.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    Thanks, Bill! Reading about MQ, this is definitely similar to our `incoming_gprs_data` table. We will look into this further. – Heru S Sep 06 '16 at 00:32
0

In the end, we ended up having a different solution to resolve this mainly because the incoming data that we are inserting is coming from different sources (hence, different processes). Therefore, we cannot use multiple-row INSERT and START TRANSACTION and COMMIT in this matter.

To resolve this, we alter the incoming_gprs_data table to use MEMORY engine. This table acts like a temporary table to gather all the incoming data from different sources. We will then using a cron will process these data, insert them into another table processed_data_xxx, and finally delete them. This removes all the slow insert queries.

We do understand the cons of having MEMORY engine (such as high volatility and lack of sorting and hash indexes). But, the speed in writing and reading using MEMORY engine suits this situation.

In inserting the processed data to the table processed_data_xxx, we have followed suggestion from @Ollie Jones to use START TRANSACTION and COMMIT instead of autocommitting each insert query.

Heru S
  • 1,283
  • 2
  • 17
  • 28
  • 1
    Even better is to use InnoDB with just essentially no keys. The processing needs no index since you scan the entire table. InnoDB is better because of row-level locking. No need for cron -- simply pause a second and repeat. This will get the data in faster and be self-regulating. See my [_flip-flop technique_](https://mariadb.com/kb/en/mariadb/data-warehousing-high-speed-ingestion/) – Rick James Sep 05 '16 at 16:56
  • Interesting technique, Rick! At the moment, it seems having the 'staging' table by itself is sufficient in our situation. Will definitely remember this when the data in the staging table gets really large quickly! Thanks again! – Heru S Sep 06 '16 at 00:37
0

One solution that is fairly pragmatic is to not do direct inserts, but to write to a redis queue and then consume that once per second in order to do batch inserts. These processes require only a few lines of code (in any language).

Something like: In a loop read all records from the queue and insert them into mysql. Sleep x times 100 ms in the loop until the wall clock is one second further and then start the loop again.

It is very fast and pragmatic, but you lose real-time confirmation of successful inserts into the database. With this method I was able to achieve up to 40k inserts per second on a single machine.

mevdschee
  • 1,625
  • 19
  • 16