10

Instagram's Postgres method of implementing custom Ids for Sharding is great, but I need the implementation in MySQL.

So, I converted the method found at the bottom of this blog, here: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

MySQL Version:

CREATE TRIGGER shard_insert BEFORE INSERT ON tablename
FOR EACH ROW BEGIN

DECLARE seq_id BIGINT;
DECLARE now_millis BIGINT;
DECLARE our_epoch BIGINT DEFAULT 1314220021721;
DECLARE shard_id INT DEFAULT 1;

SET now_millis = (SELECT UNIX_TIMESTAMP(NOW(3)) * 1000);
SET seq_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = "dbname" AND TABLE_NAME = "tablename");
SET NEW.id = (SELECT ((now_millis - our_epoch) << 23) | (shard_id << 10) | (SELECT MOD(seq_id, 1024)));
END

The table looks roughly like this:

CREATE TABLE tablename (
    id BIGINT AUTO_INCREMENT,
    ...
)

Question:

  1. There is a concurrency problem here. When spawning 100 threads and running inserts, I am getting duplicate sequence values, meaning two triggers are getting the same auto_increment value. How can I fix this?

I tried creating a new table, e.g. "tablename_seq", with one row, a counter to store my own auto_increment values, then doing updates to that table inside the TRIGGER, but the problem is I can't LOCK the table in a Stored Procedure (trigger), so I have the exact same problem, I can't guarantee a counter to be unique between triggers :(.

I'm stumped and really would appreciate any tips!

Possible Solution:

  1. MySQL 5.6 has UUID_SHORT() which generates unique incrementing values which are guaranteed to be unique. It appears in practice when calling this that each call increments the value +1. By using: SET seq_id = (SELECT UUID_SHORT()); it appears to remove the concurrency problem. The side effect of this is that now (roughly) no more than 1024 inserts can happen per millisecond in the entire system. If more do, then it's possible for a DUPLICATE PRIMARY KEY error. The good news is that in benchmarks on my machine, I get ~3,000 inserts/s with or wtihout the trigger contianing UUID_SHORT(), so it doesn't appear to slow it down at all.
jsidlosky
  • 407
  • 1
  • 5
  • 17
  • would a simple unique index on the table column AUTO_INCREMENT do your job? that way you generate an alert if a second generated ID is one that already exists in the table. and you probably can catch this alert an let the procedure re-run. not the most efficient way but it would most likely work. – dom Sep 07 '14 at 06:56
  • Maybe something like this [SQL Fiddle](http://sqlfiddle.com/#!2/a6fd4/1) can give you, at least, ideas on how to implement what you need. – wchiquito Sep 07 '14 at 11:10
  • @wchiquito Thanks much for taking the time to create the [SQL Fiddle](http://sqlfiddle.com/#!2/a6fd4/1) link. However, upon running it, it appears to have the same concurrency problem. E.g. 11828889504449540 was repeated 3 times as the 3rd-6th IDs in the test. Did it work for you? That's a clever idea to use an insert/last_insert_id on another table in a stored proc! However, it doesn't appear to work. – jsidlosky Sep 07 '14 at 18:49
  • @dom I'm really looking for a solution without a retry occurring, thanks for trying though! – jsidlosky Sep 07 '14 at 18:51
  • At the interface of [SQL Fiddle](http://sqlfiddle.com/) repetitions are shown, however, the `id` column in the table `tablename` is a PRIMARY KEY, also in a local environment with MySQL running the same script, there are no repetitions. – wchiquito Sep 07 '14 at 19:21
  • @wchiquito Thanks! It appears you've solved the problem. If you re-post your comment as an answer, I should be able to release the bounty to you. – jsidlosky Sep 09 '14 at 00:05

2 Answers2

2

The following SQL Fiddle generates an output as shown below:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 45
Server version: 5.5.35-1

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select `id` from `tablename`;
+-------------------+
| id                |
+-------------------+
| 11829806563853313 |
| 11829806563853314 |
| 11829806563853315 |
| 11829806563853316 |
| 11829806563853317 |
| 11829806563853318 |
| 11829806563853319 |
| 11829806563853320 |
| 11829806563853321 |
| 11829806563853322 |
| 11829806563853323 |
| 11829806563853324 |
| 11829806563853325 |
| 11829806563853326 |
| 11829806563853327 |
| 11829806563853328 |
| 11829806563853329 |
| 11829806563853330 |
| 11829806563853331 |
| 11829806563853332 |
| 11829806563853333 |
| 11829806563853334 |
| 11829806563853335 |
| 11829806563853336 |
| 11829806563853337 |
| 11829806563853338 |
| 11829806563853339 |
| 11829806563853340 |
| 11829806563853341 |
| 11829806563853342 |
| 11829806563853343 |
| 11829806563853344 |
| 11829806563853345 |
| 11829806563853346 |
| 11829806563853347 |
| 11829806563853348 |
| 11829806563853349 |
| 11829806563853350 |
| 11829806563853351 |
| 11829806563853352 |
+-------------------+
40 rows in set (0.01 sec)

Accept the answer if it really solves your need.

UPDATE

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 46
Server version: 5.5.35-1

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> DELIMITER //

mysql> DROP FUNCTION IF EXISTS `nextval`//
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> DROP TRIGGER IF EXISTS `shard_insert`//
Query OK, 0 rows affected (0.00 sec)

mysql> DROP TABLE IF EXISTS `tablename_seq`, `tablename`;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `tablename_seq` (
    ->   `seq` BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY
    -> )//
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `tablename` (
    ->   `id` BIGINT UNSIGNED PRIMARY KEY
    -> )//
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE FUNCTION `nextval`()
    -> RETURNS BIGINT UNSIGNED
    -> DETERMINISTIC
    -> BEGIN
    ->   DECLARE `_last_insert_id` BIGINT UNSIGNED;
    ->   INSERT INTO `tablename_seq` VALUES (NULL);
    ->   SET `_last_insert_id` := LAST_INSERT_ID();
    ->   DELETE FROM `tablename_seq`
    ->   WHERE `seq` = `_last_insert_id`;
    ->   RETURN `_last_insert_id`;
    -> END//
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TRIGGER `shard_insert` BEFORE INSERT ON `tablename`
    -> FOR EACH ROW
    -> BEGIN
    ->   DECLARE `seq_id`, `now_millis` BIGINT UNSIGNED;
    ->   DECLARE `our_epoch` BIGINT UNSIGNED DEFAULT 1314220021721;
    ->   DECLARE `shard_id` INT UNSIGNED DEFAULT 1;
    ->   SET `now_millis` := `our_epoch` + UNIX_TIMESTAMP();
    ->   SET `seq_id` := `nextval`();
    ->   SET NEW.`id` := (SELECT (`now_millis` - `our_epoch`) << 23 |
    ->                            `shard_id` << 10 |
    ->                            MOD(`seq_id`, 1024)
    ->                   );
    -> END//
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO `tablename`
    -> VALUES
    -> (0), (0), (0), (0), (0),
    -> (0), (0), (0), (0), (0),
    -> (0), (0), (0), (0), (0),
    -> (0), (0), (0), (0), (0),
    -> (0), (0), (0), (0), (0),
    -> (0), (0), (0), (0), (0),
    -> (0), (0), (0), (0), (0),
    -> (0), (0), (0), (0), (0)//
Query OK, 40 rows affected (0.00 sec)
Records: 40  Duplicates: 0  Warnings: 0

mysql> DELIMITER ;

mysql> SELECT `id` FROM `tablename`;
+-------------------+
| id                |
+-------------------+
| 12581084357198849 |
| 12581084357198850 |
| 12581084357198851 |
| 12581084357198852 |
| 12581084357198853 |
| 12581084357198854 |
| 12581084357198855 |
| 12581084357198856 |
| 12581084357198857 |
| 12581084357198858 |
| 12581084357198859 |
| 12581084357198860 |
| 12581084357198861 |
| 12581084357198862 |
| 12581084357198863 |
| 12581084357198864 |
| 12581084357198865 |
| 12581084357198866 |
| 12581084357198867 |
| 12581084357198868 |
| 12581084357198869 |
| 12581084357198870 |
| 12581084357198871 |
| 12581084357198872 |
| 12581084357198873 |
| 12581084357198874 |
| 12581084357198875 |
| 12581084357198876 |
| 12581084357198877 |
| 12581084357198878 |
| 12581084357198879 |
| 12581084357198880 |
| 12581084357198881 |
| 12581084357198882 |
| 12581084357198883 |
| 12581084357198884 |
| 12581084357198885 |
| 12581084357198886 |
| 12581084357198887 |
| 12581084357198888 |
+-------------------+
40 rows in set (0.00 sec)

See db-fiddle.

wchiquito
  • 16,177
  • 2
  • 34
  • 45
2

An alternative is to grab blocks of auto increment numbers. If you set MySQLs auto increment increment to something like 1000, a process can do an insert in the "sequence" table and get the auto increment value. The process then knows that it has 1000 sequential numbers it can use, starting at that number, that will be free of conflicts. There is no need to record every increment in a central table if all you are recording is a number.

This is most commonly used in multiple master setups in addition to the auto increment offset. You could go the multiple master route too, and insert on the different masters. The auto increment increment and offset would assure no conflicts. This would require solid knowledge of MySQL replication.

Brent Baisley
  • 12,641
  • 2
  • 26
  • 39