8

I want to know how dropBox is able to synchronize the large data files without replacing or re-uploading the files again to the dropbox server

Example: an encrypted zip archive

Suppose I've a 1GB encrypted zip archive file Fully synchronized on my computer and on the dropbox servers,

On my computer I added to that zip archive file a file of size about 5MB then saved the file on my computer,

dropbox is able to synchronize zip archive file without re-uploading the whole file again instead it just update it with the small change I've done.

Also TrueCrypt containers works in that manner

Any keywords, ideas, topics, reviews, links, code is greatly appreciated.

SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
Michael Emad
  • 81
  • 1
  • 2
  • It is the simplest trick imaginable, completely anathema to the way programmers think. Don't show a progress bar. Just make it happen, unobservable by the user. It is a shell extension, so easy to make it look like Windows is doing the copying when actual updating needs to take place. And of course, if it is slow it is because Windows sucks. You see the Explorer progress bar. Since you can't see what is going on, you'll need a tool like WireShark to observe it. – Hans Passant Aug 15 '11 at 01:06
  • 1
    That's a good illusion trick, but I'm not sure it's what Michael is asking, he's talking about how to determine which parts of the file have been changed so that you don't need to re-upload 1Gb of data because you've changed 5mb of it. – Russ Clarke Aug 15 '11 at 11:18
  • @Russ c you are talking correctly – Michael Emad Aug 17 '11 at 14:42

2 Answers2

10

Dropbox uses the rsync algorithm to generate delta files with the difference from file A1 to file A2. Only the delta(usually much smaller than A2) is uploaded to the dropbox servers since dropbox already has file A1. The delta file can then be applied to file A1, turning it into file A2.

You can learn more about the algorithm here. http://en.wikipedia.org/wiki/Rdiff-backup#Variations

The source code for the library behind the delta creation can be found here. http://librsync.sourceforge.net/

G Chris DCosta
  • 311
  • 3
  • 7
1

My first thought (it's late sorry!) is that it might be performing a hash at a block level.

For example, it might generate a hash for each 64k segment and then uploads the whole segment for each portion that has a different hash.

Russ Clarke
  • 17,511
  • 4
  • 41
  • 45
  • If you had a 1Gb file and you put an encrypted zip of that file (say 500 Mb) into your DB folder I would have thought that changing one byte of that source file would completely alter the contents of all parts of it's zipped version and DB would have to upload the while lot again. I.e. I don't think the hashed segments technique works within a zipped file. It should work at the file level as each file in a ZIP archive is compressed/encrypted without reference to the other files I guess that solid archives (http://en.wikipedia.org/wiki/Solid_compression) can't take advantage of this. – rossmcm Apr 10 '13 at 13:19