0

If I want to combine two numbers (Int,Long,...) n1,n2in a non-commutative way, p*n1 + n2 where p is an arbitrary prime seems reasonable enough a choice.

As many hashing options return a byte array, though, I am now trying to substitute the numbers with byte arrays.

Assume a,b:Array[Byte] are of the same length.

+ simply becomes an xor

but what should I use as a "Multiplication"?

p:Long a(n arbitrary) prime, a:Array[Byte] of arbitrary length

I could, of course, convert a to a long, multiply, then convert the result back to an Array of Bytes. The problem with that is that I will need "p*a" to be of the same length as a for the subsequent xor to make sense. I could circumvent this by zero-extending the shorter of the two byte arrays, but then the byte arrays quickly grow in length.

I could, on the other hand, convert p to a byte array and xor it with a. Here, the issue is that then (p*(p*a+b)+c) becomes (a+b+c), which is commutative, which we don't want.

I could add p to every byte in the array (throwing away the overflow).

I could add p to every byte in the array (not throwing away the overflow).

I could circular shift a by some f(p) bits (and hope it doesn't end up becoming a again)

And I could think of a lot more nonsense. But what should I do? What actually makes sense?

User1291
  • 7,664
  • 8
  • 51
  • 108

1 Answers1

0

If you want to mimic the original ideal of multiplying by a prime, the obvious generalization is to do arithmetic in the Galois field GF(2^8) - see https://en.wikipedia.org/wiki/Finite_field_arithmetic and note that you can essentially use log and antilog tables of size 256 to replace multiplication with not much more than table lookup - https://en.wikipedia.org/wiki/Finite_field_arithmetic#Implementation_tricks. Arithmetic over a finite field of any sort will have many of the nice properties of arithmetic modulo a prime - arithmetic modulo p is GP(p) or GF(p^1), if you prefer.

However this is all rather untried and perhaps a little high-flown. Other options include checksum algorithms such as https://en.wikipedia.org/wiki/Adler-32 or - if you already have a hash algorithm that maps long strings into a short array of bytes, simply concatenating the two arrays of bytes to be combined and running the result through the hash algorithm again, perhaps with some padding before and after to give you some parameters you can play with if you need to vary or tune things.

mcdowella
  • 19,301
  • 2
  • 19
  • 25