I know that RDC is how the Distributed File System Replication (DFSR) keeps data on all shared devices in an active directory synced together. I only understood that RDC splits data into chunks, then it hashes each one of those chunks into what's called a signature. The set of signatures is transferred from server to client. The client compares the server signatures to its own. The client then requests the server send only the data for signatures that are not already on the client.
What I don't understand is this quote from Microsoft:
"RDC divides a file's data into chunks by computing the local maxima of a fingerprinting function that is computed at every byte position in the file. A fingerprinting function is a hash function that can be computed incrementally. For example, if you compute the function F over a range of bytes from the file, Bi...Bj, it should then be possible to compute F(Bi+1...Bj+1) incrementally by adding the byte Bj+1 and subtracting the byte Bi. The range of bytes from the file, Bi...Bj, is called the hash window. The length of this window, in bytes, is called the hash window size.
The RDC library's FilterMax signature generator "slides" the hash window across the entire file by adding the byte at the leading edge and subtracting the byte at the trailing edge of the window. Meanwhile, the generator continually examines the sequence of fingerprint function values over a given range of bytes, called the horizon size. If a fingerprint function value is a local maximum within the range, its byte position is chosen as a "cut point," or chunk boundary.
After the file has been divided into chunks, the signature generator computes a strong hash value (an MD4 hash), called a signature, for each chunk. The signatures can be used to compare the contents of two arbitrarily different versions of a file.
Because the size of the signature file grows linearly with the size of the original file, comparing very large files can be expensive. This cost is reduced dramatically by applying the RDC algorithm recursively to the signature files. For example, if the original file size is 9 GB, the signature file size would typically be about 81 MB. If the RDC algorithm is applied to the signature file, the resulting second-level signature file size would be about 5.7 MB."
What I don't understand is two things: What does any of this "can be computed incrementally" stuff have to do with how RDC works? And how does recursivity help reduce bandwidth?