Sharding and ID generation as instagram

Question

My question is regarding ID generation for sharded environment. I am following the same steps as instagram did for generating unique ids. I have a few question on the implementation of this id generation in MySQL.

This is how the ID is being generated (This is a PGQL stored procedure.)

CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 5;
BEGIN
    SELECT nextval('insta5.table_id_seq') %% 1024 INTO seq_id;

    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id << 10);
    result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;



CREATE TABLE insta5.our_table (
    "id" bigint NOT NULL DEFAULT insta5.next_id(),
    ...rest of table schema...
)

MY question is how can I do that in MySQL (equivalent code). This stored procedure should be called from a query to get the next id.

My other question is regarding querying the shards. From what I understand they use logical shards that are maps to actual servers. If they decide to map a logical shard to a new server, this means that they have to query two servers at the same time to aggregate the results, because before they assigned a new server for the same logical shard, some data was added to the older server. I just wanted to know if there's a better way to use logical shards and having the option to query a single server where the data resides instead of querying all the servers that are and were belong to that logical shard?

Thanks.

score 1 · Answer 1 · answered Oct 22 '12 at 13:14

1

By the look of the code it looks like you just need to replicate sequence, you can do it by creating table in MySQL with AUTO_INCREMENT and use it for generating identity numbers.

answered Oct 22 '12 at 13:14

Farfarak

1,497
1
8
8

but how still can't ordered results from several shards if I run AI on every shard start from 1. I want to page the data (ex. 10 last records, page 2 next 10 records and so on). – Liron Harel Oct 22 '12 at 21:36
What I meant was to replace this line of code in PL/SQL code: SELECT nextval('insta5.table_id_seq') %% 1024 INTO seq_id;, with Auto_Increment column value from the table. And rewrite the rest of the query in mysql. – Farfarak Oct 23 '12 at 08:10

Yugandhar Chaudhari · Accepted Answer · 2019-12-17T11:57:10.360

This is postgres equivalent

Specify TABLE_SCHEMA and MASKTABLE below

I am creating a MASKTABLE with a dummy first entry

DELIMITER $$
CREATE OR REPLACE FUNCTION generate_next_id() RETURNS bigint NOT DETERMINISTIC
MAIN: BEGIN
DECLARE our_epoch bigint;
DECLARE seq_id bigint;
DECLARE now_millis bigint;
DECLARE shard_id int;
DECLARE param bigint ;
SET @our_epoch = 1568873367231;
SET @shard_id = 1;
SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = "SCHEMANAME" AND TABLE_NAME = "MASKTABLE" into @seq_id;
SELECT FLOOR(UNIX_TIMESTAMP()) * 1000 into @now_millis;
SELECT (@now_millis - @our_epoch) << 23 into @param;
SELECT @param | (@shard_id <<10) into @param; 
select @param | (@seq_id) into @param; 
RETURN @param;
END MAIN;$$ 
DELIMITER ;

Usage

select generate_next_id()

You can use it in trigger like

CREATE TRIGGER trigger_name
BEFORE INSERT ON TableName 
FOR EACH ROW
SET new.id = generate_next_id();

score 0 · Answer 3 · edited Nov 06 '12 at 22:44

0

At ScaleBase we found a nice way to provide support for autoincrements, through the variable: SET @@auto_increment_increment=4;.

Just make sure each shard has a starting offset and you're free to go. It's straight, simple, compatible with existing MySQL and stuff.

edited Nov 06 '12 at 22:44

David Wolever

148,955
89
346
502

answered Oct 22 '12 at 15:54

Doron Levari

566
4
4

Does scalebase can maintain the incremental Id's. I ask that because my URLs are bound to the primary key (website.com/?record=112 , 112 is a primary key to that specific record) – Liron Harel Oct 22 '12 at 23:58

score 0 · Answer 4 · answered May 09 '14 at 09:47

Regarding the other question, all data for one logical shard only live on one server in production at a time (without taking replication into account). When they're running out of capacity of their existing servers, they duplicate data in each server to a follower, and when the new server is ready (at this point both the original server and the new server contain all data for a specific logical shard), they start reading all data for half of the shards from the new server, and the other half stays in the original server. See e.g. http://www.craigkerstiens.com/2012/11/30/sharding-your-database/ for illustration.

score 0 · Answer 5 · edited May 23 '17 at 10:33

I have a question out for the conversion routine I wrote, but it does appear to work!

Can AUTO_INCREMENT be safely used in a BEFORE TRIGGER in MySQL

Once I get the question answered, I'll update this answer as well.

MySQL Version:

CREATE TRIGGER shard_insert BEFORE INSERT ON tablename
FOR EACH ROW BEGIN

DECLARE seq_id BIGINT;
DECLARE now_millis BIGINT;
DECLARE our_epoch BIGINT DEFAULT 1314220021721;
DECLARE shard_id INT DEFAULT 1;

SET now_millis = (SELECT UNIX_TIMESTAMP());
SET seq_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = "dbname" AND TABLE_NAME =     "tablename");
SET NEW.id = (SELECT ((now_millis - our_epoch) << 23) | (shard_id << 10) | (SELECT MOD(seq_id, 1024)));
END

score 0 · Answer 6 · answered Oct 15 '14 at 21:17

Implementing the exact same id generation logic with a MySQL stored procedure is not possible. However, this can be done using a MySQL UDF.

Here is a UDF that recreates instagrams id generator with a few changes. Modifying it to work exactly like the instagram id generator would involve updating the MAX_* constants and the bit shifting logic in next_shard_id and shard_id_to_ms.

Sharding and ID generation as instagram

6 Answers6

Linked