0

I'm looking for any information or algorithms that allows differential file saving and merging.

To be more clear, I would like when modifying the content of a file the original file should stay the same and every modification made must be saved in a separate file (same thing as differential backup but for files), in case of accessing the file, it should reconstruct the latest version of the file using the original file and the last differential file.

What I need to do is described in the diagram below :

enter image description here

jps
  • 20,041
  • 15
  • 75
  • 79
  • Do you know about [`git`](https://en.wikipedia.org/wiki/Git)? It doesn't do precisely what you describe, but very close. – Stef Oct 05 '20 at 12:09
  • There also appear to be an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) in your question. Why do you need the diff tool to work this way precisely? Why not save the latest version of the file as a whole new file, then compute the diff file when you need it? There are several good algorithms and tools to compute difference between files, including [GNU diffutils](https://www.gnu.org/software/diffutils/) and [git diff](https://git-scm.com/docs/git-diff) – Stef Oct 05 '20 at 12:12
  • well it's for the sake of a new approche that i'm working on for my PhD, i'm storing the file reference in a Blockchain and i need to manage updates, Blockchain does not allow updates or deletion, so in order to manage updates i need to save the difference in a separate file and then reconstruct the file whenever is needed (knowing that the same file could belong to many users) – Yassine El Khanboubi Oct 05 '20 at 12:16

1 Answers1

0

For calculating diffs you could use something like diff_match_patch.

You could store for each file version series of DeltaDiff.

DeltaDiff would be a tuple of one of 2 types: INSERT or DELETE.

Then you could store the series of DeltaDiff as follows:

Diff = [DeltaDiff_1, DeltaDiff_2, ... DeltaDiff_n ] = [

    (INSERT, byteoffset regarding to initial file, bytes)
    (DELETE, byteoffset regarding to initial file, length)
    ....
    (....)
]

Applying the DeltaDiffs to initial file would give you the next file version, and so on, for example:

FileVersion1 + Diff1 -> FileVersion2 + Diff2 -> FileVersion3 + ....

StPiere
  • 4,113
  • 15
  • 24
  • thank you, this is only comparing text and text files, i need to compare the difference for any type of file i believe it would be a binary comparison – Yassine El Khanboubi Oct 05 '20 at 13:34
  • I had binary files in mind ... you could treat every binary file as text file if you wish - by working with bytes as chars. – StPiere Oct 05 '20 at 18:55