What's the fastest running time for modular multiplication of large integers on multiple processors?

Question

I am interested to learn about the time complexity for multiplying large integers modulo a (large) constant using multiple processors. What this boils down to is basically just integer multiplication, since division and remainder can also be implemented using multiplication (e.g. reciprocal multiplication or Barrett reduction).

I know that the runtime of currently known integer multiplication algorithms is roughly bounded by the lower bound o(n * log n). My research has failed to find out if this is for a single core or multi core machine. However, I am thinking this is for a single core machine as the algorithms seem to use a divide-and-conquer approach.

Now my question is, what is the currently known lower bound for the time complexity of a parallel integer multiplication algorithm implemented on m cores? Can a time complexity with lower bound of o(n) or less be achieved, given enough cores? (i.e. if m depends on n?) Here o(n) describes the input size of the integers at hand.

So far in my research I have read several papers claiming a speedup using parallel FFT multiplication. Unfortunately these only claim empirical speedups (e.g. "a 56% speed improvement using 6 cores on a such and such computer") and then fail to explain the theoretical speedup expressed in time complexity bounds.

I am aware the "fastest" integer multiplication algorithm has not yet been found, this is an unsolved problem in computer science. I am merely inquiring about the currently known bounds for such parallel algorithms.

Update #1: User @delnan linked to a wiki page about the NC complexity class. That wiki page mentions integer multiplication is in NC, meaning there exists an O((log n)^c) algorithm on O(n^k) processors. This is helpful towards getting closer to an answer. The part that's left unanswered for now is what are the c and k constants for integer multiplication and which parallel algorithm lends itself to this purpose?

Update #2: According to page 12 of 15 in this PDF file from a Computer Science course at Cornell University, integer multiplication in the NC complexity class takes O(log n) time on O(n^2) processors. It also explains an example algorithm on how to go about this. I'll write up a proper answer for this question shortly.

One last question to satisfy my curiosity: might anyone know something about the currently known time complexity for "just" O(n), O(sqrt(n)) or O(log n) processors?

You might have more luck over at [cs.se], or failing that, [cstheory.se] (I find it hard to judge the level of a question I can't answer). — , Oct 01 '14 at 11:54
For O(n) processors: I can't seem to find the right citation, but my strong expectation is that there exists an FFT-based method with running time O(polylog n), since the FFT is naturally parallel up to n processors. Such a method would scale down to lesser numbers of processors. — David Eisenstat, Oct 01 '14 at 20:30
Thanks @DavidEisenstat. You are right, since FFT is a divide-and-conquer algorithm, it can be run in [polylogarithmic time](http://en.wikipedia.org/wiki/Time_complexity#Polylogarithmic_time) in parallel. In the case of an `O(n * log n)` FFT algorithm, this becomes `O(log n)` time on `O(n)` processors (see pages 5-6 of [this PDF](http://www.cs.berkeley.edu/~demmel/cs170_spr07/LectureNotes/Lecture_Parallelism_DC.pdf)). I'm not sure how this affects all the currently known multiplication algorithms, but it's safe to say that the ones that use FFT will adhere to FFT's currently known lower bound. — webdevelopersdiary, Oct 02 '14 at 05:53

score 1 · Answer 1 · answered Oct 01 '14 at 12:28

The computational complexity of algorithms is not affected by parallelisation.

For sure, take a problem such as integer multiplication and you can find a variety of algorithms for solving the problem. These algorithms will exhibit a range of complexities. But given any algorithm running it on p processors will, at theoretical best, give a speedup of p times. In terms of computational complexity this is like multiplying the existing complexity, call it O(f(n)), by a constant, to get O((1/p)*f(n)). As you know multiplying a complexity by a constant doesn't affect the complexity classification.

To take another, perhaps more practical, line of argument, changing the number of processors doesn't change the number of basic operations that an algorithm performs for any given problem at any given size -- except for any additional operations necessitated by coordinating the operation of parallel components.

Again, bear in mind that computational complexity is a very theoretical measure of computational effort, generally measured as the number of some kind of basic operations required to compute for a given input size. One such basic operation might be the multiplication of two numbers (of limited size). Changing the definition of the basic operation to multiplication of two vectors of numbers (each of limited size) won't change the complexity of algorithms.

Of course, there is a lot of empirical evidence that parallel algorithms can operate more quickly than their serial counterparts, as you have found.

See [super linear speedup](http://en.wikipedia.org/wiki/Speedup#Super_linear_speedup) — Sam Harwell, Oct 01 '14 at 12:38
@280Z28, What Mark is saying is that the complexity of an algorithm is not measured in time but rather in the number of a basic operation. The speed up may be there, but in the end the number of operations run by the processors will still be the same. — Rerito, Oct 01 '14 at 12:43
-1 You assume p is constant, but OP also asks about cases where the number of processors depends on the input size. This is seriously studied in computer science, what you describe is just the most basic application of complexity theory that has made it down into undergraduate CS courses. See [NC](http://en.wikipedia.org/wiki/NC_%28complexity%29) for example. Also, stressing measuring time vs measuring basic operations is bonkers, for a constant number of processors it's equivalent and in other models the time, or the size or the depth of a circuit, or some other measure can be more useful. — , Oct 01 '14 at 12:44
Thanks for that [link to NC](http://en.wikipedia.org/wiki/NC_%28complexity%29#Problems_in_NC), @delnan. That wiki page mentions integer multiplication is in NC, meaning there exists an O((log n)^c) algorithm on O(n^k) processors. Now, the part that's left unknown is what are the `c` and `k` constants for integer multiplication and which parallel algorithm lends itself to this purpose? I'll update the question. — webdevelopersdiary, Oct 01 '14 at 15:53

What's the fastest running time for modular multiplication of large integers on multiple processors?

1 Answers1