2

The following contains a very fast and elegant Java implementation of Rabin's Fingerprint https://github.com/themadcreator/rabinfingerprint

However, the largest polynomial that may be used in the optimized implementation is 54 bit.

I want to reduce the probability of error.

Rabin [1] suggests two ways to lower the probability of error: • The probability of a wrong output will be lowered by increasing the value of k. This will require a larger word-length. • The probability can also be lowered by using two different irreducible polynomials P1(t) and P2(t) of the same degree k. The algorithm is then run twice by interleaving steps, one time with P1(t) and another time with P2(t). Since the error probabilities are independent .... (from CMPUT690 Term Project)

If I run the algorithm twice, how do I combine the 2 fingerprints without undermining my objective to reduce the probability of error?

  • simply add or multiple the 2 fingerprints?
  • use the output of the first run as the base fingerprint of the second run?

It is not clear to me what "interleaving steps are". I need to save the fingerprint as a 64 bit long number.

Thanks.

1 Answers1

2

You can't. What Rabin suggests is effectively running the algorithm twice with different irreducible polynomials and then concatenating the output, which would give you 108 bits in your case. Thing is, there's no way to compress that down to 64 bits without throwing away most of the error reduction: by the pigeonhole principle, the absolute lowest probability of error you could hope for with any algorithm is

  • about 1/2^56 when you use a 56-bit fingerprint
  • about 1/2^64 when you use a 64-bit fingerprint
  • about 1/2^128 when you use a 128-bit fingerprint

and since Rabin's algorithm comes close to those bounds, going from a 54 to a 64 bit fingerprint will give at most a ~2^10 = ~1,000 fold reduction in the error.

If that improvement is worth your time though, your best option is to calculate the two 54-bit fingerprint, throw away the 20 bits of highest order from each of them (to get two 32 bit fingerprints), then concatenating those to get a 64-bit fingerprint.

Andy Jones
  • 4,723
  • 2
  • 19
  • 24