I recently came to this same problem.
First of all I'm going to assume you mean 32-bit integers (after reading your comments), but I think this applies to Big Integers as well (because doing a naive multiplication means doubling the word size and is going to be slow as well).
Option 1
We use the following property:
Proposition. a*b mod m = (a - m)*(b - m) mod m
Proof.
(a - m)*(b - m) mod m =
(a*b - (a+b)*m + m^2) mod m =
(a*b mod m - ((a+b) + m)*m mod m) mod m =
(a*b mod m) mod m = a*b mod m
q.e.d.
Moreover, if a,b approx m, then (a - m)*(b - m) mod m = (a - m)*(b - m)
. You will need to address the case for when a,b > m, however I think the validity of (m - a)*(m - b) mod m = a*b mod m
is a corollary of the above Proposition; and of course don't do this when the difference is very big (small modulus, big a or b; or vice versa) or it will overflow.
Option 2
From Wikipedia
uint64_t mul_mod(uint64_t a, uint64_t b, uint64_t m)
{
uint64_t d = 0, mp2 = m >> 1;
int i;
if (a >= m) a %= m;
if (b >= m) b %= m;
for (i = 0; i < 64; ++i)
{
d = (d > mp2) ? (d << 1) - m : d << 1;
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
a <<= 1;
}
return d;
}
And also, assuming long double
and 32 or 64 bit integers (not arbitrary precision) you can exploit the machine priority on most significant bits of different types:
On computer architectures where an extended precision format with at least 64 bits of mantissa is available (such as the long double type of most x86 C compilers), the following routine is faster than any algorithmic solution, by employing the trick that, by hardware, floating-point multiplication results in the most significant bits of the product kept, while integer multiplication results in the least significant bits kept
And do:
uint64_t mul_mod(uint64_t a, uint64_t b, uint64_t m)
{
long double x;
uint64_t c;
int64_t r;
if (a >= m) a %= m;
if (b >= m) b %= m;
x = a;
c = x * b / m;
r = (int64_t)(a * b - c * m) % (int64_t)m;
return r < 0 ? r + m : r;
}
These are guaranteed to not overflow.