0

Alice and Bob have two sets of ~10000 pieces of data (each smaller than 65536 bytes, usually much smaller), each with a 12-byte ID (timestamp + CRC32). They both would like to have a union of these sets. Their sets only differ slightly: perhaps Alice or Bob gained 10 new pieces, so they'd better determine which pieces of data to send to each other and send only them.

Alice and Bob have established a TLS connection (with somewhat complex access control invonving client certificates and own CA). How do they carry out the replication?

So far, they've been:

  1. building an array of IDs
  2. using librsync to replicate it to the other party
  3. the other party, having both arrays, would determine the appropriate actions and send/request missing data pieces

If the connection breaks, they account for any pieces of data already received and restart the procedure until no changes need to be sent.

Is there a better way to do that? An existing protocol, maybe?

aitap
  • 325
  • 1
  • 9

1 Answers1

1

If you trust rsync, then modify the data to be indexed by your current ID plus whether it is from Alice or Bob. Then Alice uses rsync to get data to Bob and Bob uses rsync to get data from Alice. Both process data independently thereafter.

If you want more efficient under the hood, you can use cryptographically signed checkpoints for when all data up to a certain point has been received. When one has sent that checkpoint, and the other has verified that their data matches that checkpoint, then all the data before that checkpoint is known to match and neither tries to synchronize it again. If your timestamps represent when the data came into possession of Alice and/or Bob, they can be used for this purpose. Otherwise you can add an ID for that purpose.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • I can't guarantee that no data older than a given timestamp will show up (e.g. Bob can lose all his database and ask Alice for another copy), but I will add a field: "sync data newer than..." and fill it with (current date-time - user-selectable offset). – aitap Sep 22 '17 at 14:35