I'm writing some C code for a research project in number theory, which requires to do a lot of operations in modular arithmetic, with many different moduli. To put it simple: I need to do the operation (a * b) % n
many many times.
The code is meant to run on a PC with 64 bits words, and all the moduli are known to be less than 2^64, so all the operands are implemented by unsigned 64 bits integers.
My question is: Would using Montgomery modular multiplication (that makes use only of addition and multiplication) instead of the C modulo operator %
(that translates to a % n = a - n*(a / n)
and uses also a division) result in a faster execution?
Intuitively, I would say that the answer is: No, because (word-size) divisions on a PC are not too much computationally expensive than (word-size) multiplications, and Montgomery reduction would actually cause an overhead.
Thank for any suggestions.
Update: On the one hand, according to Paul Ogilvie (see his comment below), (a * b) % n
requires 1 multiplication and 1 division. On the other hand, Montgomery multiplication requires (ignoring the operations needed to convert, and convert back, the operands to their Montgomery representations, since they are done once time only for every modulo n; and the binary shifts) 3 multiplications. So it would seem that Montgomery is faster than ``%'' as soon as multiplication is performed two times faster than division...