Avoiding collisions on primary keys from separate MySQL databases

Question

I have several servers running their own instance of a particular MySQL database which unfortunately cannot be setup in replication/cluster. Each server inserts data into several user-related tables which have foreign key constraints between them (e.g. user, user_vote). Here is how the process goes about:

all the servers start with the same data
each server grows its own set of data indepedently from the other servers
periodically, the data from all the servers is merged manually together and applied back to each server (the process therefore repeats itself from step 1).

This is made possible because in addition to its primary key, the user table contains a unique email field which allows identifying which users are already existing in each database, and merging those who are new while changing the primary and foreign keys to avoid collisions and maintain the correct foreign key constraints. It works, but it's quite some effort because primary and foreign keys have to be changed to avoid collision, hence my question:

Is there a way to have each server use primary keys that don't collide with other servers to facilitate the merging?

I initially wanted to use a composite primary key (e.g. server_id, id) but I am using Doctrine which doesn't support primary keys composed of multiple foreign keys so I would have problems with my foreign key constraints.

I thought about using a VARCHAR as an id and using part of the string as a prefix (SERVER1-1,SERVER1-2, SERVER2-1, SERVER2-2...) but I'm thinking it will make the DB slower as I will have to do some manipulations with the ids (e.g. on insert, I have to parse existing ids and extract highest, increment it, concatenate it with server id...).

PS: Another option would be to implement replication with read from slaves and write to master but this option was discarded because of issues such as replication lag and single point of failure on the master which can't be solved for now.

score 3 · Accepted Answer · edited May 23 '17 at 11:43

You can make sure each server uses a different incrementation of autoincrement, and a different start offset:

Change the step auto_increment fields increment by

(assuming you are using auoincrements)

I've only ever used this across two servers, so my set-up had one with even ids and one with odd.

When they are merged back together nothing will collide, as long as you make sure all tables follow the above idea.

in order to implement for 4 servers

You would say, set-up the following offsets:

Server 1 = 1
Server 2 = 2
Server 3 = 3
Server 4 = 4

You would set your incrementation as such (I've used 10 to leave space for extra servers):

Server 1 = 10
Server 2 = 10
Server 3 = 10
Server 4 = 10

And then after you have merged, before copying back to each server, you would just need to update the autoinc value for each table to have the correct offset again. Imagine each server had created 100 rows, autoincs would be:

Server 1 = 1001
Server 2 = 1002
Server 3 = 1003
Server 4 = 1004

This is where it does get tricky due to having four servers. For imagine certain tables may not have had any rows inserted from a particular server. So you could end up with some tables having their last autoinc id not being from server 4, but from being from server 2 instead. This would make it very tricky to work out what the next autoinc should be for any particular table.

For this reason it is probably best to also include a column in each of your tables that records the server number when any rows are inserted.

id | field1 | field2 | ... | server

That way you can easily find out what the last autoinc value should be for a particular server by selecting the following on any of your tables:

SELECT MAX(id) FROM `table` WHERE `server`=4 LIMIT 0,1

Using this value you can reset the next autoinc value you need for each table on each server, before rolling the merged dataset out to the server in question.

UPDATE information_schema.tables SET Auto_increment = (
  SELECT MAX(id) FROM `table` WHERE `server`=s LIMIT 0,1
)+n WHERE table_name='table' AND table_schema = DATABASE();

Where s is the server number and n is set to the offset, so in my example it would be 10.

I can clearly see it working with 2 servers, but I'm thinking how should I do it with 4: what if I use `1` as a step for all servers, but use different `start` values on each server which are sufficiently apart to give me room for growth (e.g. `100000000` for server 1, `200000000` for server 2..), would that work? — Max, Nov 24 '12 at 12:40
I'm thinking especially after merging data, won't servers 1,2, and 3 increment from the highest id which will be that coming from server 4? — Max, Nov 24 '12 at 12:47
@user359650, By auto-incrementing by a prime number, such as 7 or 13 should help where each servers starts with next sequence like 1, 2, 3, 4, then their corresponding next number would be 8, 9, 10, 11 per respective server — DRapp, Nov 24 '12 at 14:59
No need for a prime number: the `step` just has to be equal to the number of servers. Start with `1,2,3,4` and use `4` as a step: next sequences are `5,6,7,8` - `9,10,11,12` - `13,14,15,16` etc... — Max, Nov 24 '12 at 15:16
@user359650 I've update my answer with hopefully useful info. — Pebbl, Nov 24 '12 at 15:47
This is tricky and I'd not like to be admin of that system when anything fail - recovery may be quite PITA. The trick with prefix is that Id is unique. This approach is problematic because you need to try to estimate number of rows each server can insert and set offsets the way they won't overlap. If one day you'd need to scale up beyond your former estimations then it will not be your happiest day — Marcin Orlowski, Nov 24 '12 at 15:50
@WebnetMobile.com very true, however if the OP is already using numeric autoinc IDs changing a system to using strings or a combination of strings and numbers instead might be quite a bit of work - depends on the code. For many systems expanding to beyond 10 servers would probably mean an infrastructure change anyway -- unless they were already cloud/replication/cluster based. Adding a server column would be quite easy and could be populated using MySQL triggers without modification to the original code. — Pebbl, Nov 24 '12 at 15:58

score 1 · Answer 2 · answered Nov 24 '12 at 12:19

Prefixing ID ould do the trick. As for DB being slower - depends how big traffic is served there. You can also have "prefixed id" splitted into two columns, "prefix" and "id" and these can be of any type. Would require some logic to cope with it in requests, but may be worth evaluating

Avoiding collisions on primary keys from separate MySQL databases

2 Answers2

in order to implement for 4 servers