Best way to maintain data integrity between local and remote sql databases

Question

So I have, what would seem like a common question that I can't seem to find an answer to. I'm trying to find what is the "best practice" for how to architect a database that maintains data locally, then syncs that data to a remote database that is shared between many clients. To make things more clear, this remote database would have many clients that use it.

For example, if I had a desktop application that stored to-do lists (in SQL) that had individual items. Then I want to be able to send that data to a web-service that had a "master" copy of all the different clients information. I'm not worried about syncing problems as much as I am just trying to think through actual architecture of the client's tables and the web-services tables

Here's an example of how I was thinking about it:

Client Database

list
--list_client_id (primary key, auto-increment)
--list_name

list_item
--list_item_client_id (primary key, auto-increment)
--list_id
--list_item_text

Web Based Master Database (Shared between many clients)

list
--list_master_id
--list_client_id (primary key, auto-increment)
--list_name
--user_id


list_item
--list_item_master_id (primary key, auto-increment)
--list_item_remote_id
--list_id
--list_item_text
--user_id

The idea would be that the client can create todo lists with items, and sync this with the web service at any given time (i.e. if they lose data connectivity, and aren't able to send the information until later, nothing will get out of order). The web service would record the records with the clients id's as just extra fields. That way, the client can say "update list number 4 with a new name" and the server takes this to mean "update user 12's list number 4 with a new name".

@emaillenin Why would this belong on ServerFault? I'm trying to figure out how to design the tables in the database — JoeCortopassi, Jan 31 '12 at 15:18

spacebiker · Answer 1 · 2014-05-18T15:23:55.147

I have the same issue with a project i am working on, the solution in my case was to create an extra nullable field in the local tables named remote_id. When synchronizing records from local to remote database if remote_id is null, it means that this row has never been synchronized and needs to return a unique id matching the remote row id.

Local Table            Remote Table

_id (used locally)
remote_id ------------- id
name      ------------- name

In the client application i link tables by the _id field, remotely i use the remote id field to fetch data, do joins, etc..

example locally:

Local Client Table       Local ClientType Table      Local ClientType
                         _id
                         remote_id  
_id -------------------- client_id
remote_id                client_type_id -------------- _id
                                                      remote_id
name                    name                          name

example remotely:

Remote Client Table      Remote ClientType Table      Remote ClientType
id -------------------- client_id
                        client_type_id -------------- id
name                    name                          name

This scenario, and without any logical in the code, would cause data integrity failures, as the client_type table may not match the real id either in the local or remote tables, therefor whenever a remote_id is generated, it returns a signal to the client application asking to update the local _id field, this fires a previously created trigger in sqlite updating the affected tables. http://www.sqlite.org/lang_createtrigger.html

1- remote_id is generated in the server

2- returns a signal to client

3- client updates its _id field and fires a trigger that updates local tables that join local _id

Of course i use also a last_updated field to help synchronizations and to avoid duplicated syncs.

score 0 · Accepted Answer · answered Feb 02 '12 at 21:33

I think they general concept you're working with is the right direction, but you may need to pay careful attention to the use of auto-increment columns. For example, auto-increment on the server is useless if the client is the owner of this ID. Instead, you probably want list.list_master_id to be an auto-increment. Everything else you've mentioned is entirely plausible, though the complexity may increase if there may be multiple clients per user. Then, the use of an auto-increment alone probably isn't sufficient. Instead, you may need a guid or a datatype that also includes a client identifier to prevent id collision.

Without having more details it would be difficult to speculate on what other situations you may need to consider.

score 0 · Answer 3 · answered Feb 08 '12 at 10:36

SERVER:
list
--id
--name
--user_id
--updated_at
--created_from_device_id

Those 2 tables link all records, might be grouped in one table also.
list_ids
--list_id
--device_id
--device_record_id

user_ids
--user_id
--device_id
--device_record_id


CLIENT (device_id=5)
list
--id
--name
--user_id
--updated_at

That will allow you to save records as(only showing relevant fields):

server
 list: id=1, name=shopping, user_id=1234
 user: id=27, name=John Doe

 list_ids: list_id=1, device_id=5, device_record_id=999
 user_ids: user_id=27, device_id=5, device_record_id=567

client
 id=999, name=shopping, user_id=567

This way they are totally unaware of any ID's, translations can be done quite fast and you can supply the clients only with information and ID's they know of.

Best way to maintain data integrity between local and remote sql databases

3 Answers3