I've been researching rabin fingerprinting for the past couple of days. While the general idea is simple enough I'm having significant troubles understanding the implementations that are circulating around the net. In particular all of them seem to be derived from the original LBFS paper, namely from librabinpoly the sliding window is defined as:
33 static u_int64_t slide8(RabinPoly *rp, unsigned char m) {
34 rp->circbuf_pos++;
35 if (rp->circbuf_pos >= rp->window_size) {
36 rp->circbuf_pos = 0;
37 }
38 unsigned char om = rp->circbuf[rp->circbuf_pos];
39 rp->circbuf[rp->circbuf_pos] = m;
40 return rp->fingerprint = append8 (rp, rp->fingerprint ^ rp->U[om], m);
41 }
42
43 static u_int64_t append8(RabinPoly *rp, u_int64_t p, unsigned char m) {
44 return ((p << 8) | m) ^ rp->T[p >> rp->shift];
45 }
Where the U/T tables are generated from the initial polynomial. I haven't seen in any of the papers pertaining to rabin fingerprinting to discuss the usage of those 2 tables and the XOR operations. My gut feeling is this has something to do with the modulo arithmetics but I'm not entirely sure. Git's source code also uses rabin fingerprinting but instead of deriving the tables dynamically they have a set of pre-computed ones. So my question is - what exactly do those Xor operations achieve and the code generally looks fairly different than the 'canonical' explanation of the algorithm