2

I'm building an iOS app that works (creates, edits) records (in sqlite) that are synced to/from a server database. If a record in the app has been downloaded from the server and then modified locally, I'm making a copy since I want to be able to revert to the server version. Therefore, for a given record id, I can sometimes have two copies (server, local). I am asking for assistance in designing the database layout.

Initially, I used two tables - one for storing the server records (arriving via sync), and one for storing locally modified / locally created (yet to be synced) records. I found this approach to be cumbersome, since (a) I need to do aggregate searches (select records giving priority to the locally modified copy), (b) I needed to move data from one table to another, which doesn't sound like a good practice, and (c) the schema is very complex (hundreds of columns) and maintaining the two tables schemas in sync was difficult.

I then merged everything in one table, adding a status column (server/local). This seemed fine, until I realised how complicated it was to filter the duplicate records (those for which both a server and local copy existed). Counting, searching, selecting needed 10-line intricate queries (because of the limitations in sqlite) - see my other questions here and here.

I'm contemplating now keeping everything in one table, but ditching the status column, and creating a separate table for tracking statuses, one row per record, something like this:

Data:

id    recordID  name                                 col2   col3   ...
1     1001      Server record, not changed locally   xxxx   xxxx   ...
2     1002      Server record changed locally        xxxx   xxxx   ...
3     1002      Server record changed locally        xxxx   yyyy   ...
4     1003      Record created locally               xxxx   xxxx   ...
5     1004      Server record changed locally        xxxx   xxxx   ...
6     1004      Server record changed locally        xxxx   yyyy   ...

Status tracking:

id    recordID  server  local
1     1001      1
2     1002      2       3
3     1003              4
4     1004      5       6

Aggregating the above information for display would mean showing the local record, if any, or the server record otherwise - in this case, data rows 1,3,4 and 6. In this case, my queries would be simpler (just a join with a case).

Would this be the best approach, or is there a better design I should use?

Community
  • 1
  • 1
lucianf
  • 547
  • 7
  • 17

1 Answers1

1

I would get rid of the local/server status field and introduce a timestamp/version field which is set to the current date once one of two sides alters the record. Thus you always know which of the rows is the most current record (the one with the highest timestamp). For your revert to older version features I'd consider to present the option to step back/forward one version of the record. If the user chose one older version to be reverted to, I'd save a new copy of the version with the current timestamp.

You might be able to implement an easy sync, by transfering either all or a subset of (unknown timestamps) records on your protocol. Consider dumping a list of timestamps for a given table on the client&server side, comparing the lists on deltas and prepare the insert statments for the other side. Et voila, sync is done. Only thing you would need to always consider is that you do not update a row, but create a new version with a new timestamp.

This is pretty much best practice in data warehouse systems. (No update philosophy)

If you find yourself having a large number of versions per record, you might throw away the most oldest records periodically, if there are N newer versions.

Tobias N. Sasse
  • 2,457
  • 1
  • 19
  • 14